Initially used to track and contain disease outbreaks in the 1940s (Metcalf et al., 1995), wastewater analysis has quickly gained traction as a way to monitor other public health measures such as the usage of illicit drugs (Zuccato et al., 2008). Wastewater based epidemiology (WBE) is a chemical analysis technique used to detect a myriad of compounds for regions serviced by wastewater treatment facilities. The functions of this technique range from monitoring environmental impact of household liquid waste to the very topical application of detecting the presence of Covid-19 (Glassmeyer et al., 2005), (Bentacourt et al., 2021).Following the consumption of illicit drugs, the body enzymatically breaks them down into by-products known as drug metabolites. As the process of metabolizing drugs is the same within all humans, the type and quantity of produced metabolites is already known (Concheiro et al., 2007). With this knowledge, scientists are able to estimate the quantity of drugs consumed (drug usage) within a community by measuring the metabolites within the local wastewater (Zuccato et al., 2008). Estimating the use of illicit drugs within a community is a crucial public health matter due to the known harms that regular consumption can cause (Bannwarth et al., 2019). This information can be used to help direct policies and distribute resources to communities in need.
The following analysis is based off a wastewater analysis study which estimated the drug usage of amphetamines, cannabis, cocaine, 3,4-Methylenedioxymethamphetamine (MDMA), and methamphetamine within 137 cities across 29 European countries between 2011 and 2020 (EMCDDA, 2021). In our analysis, we sought to test whether drug usage varies between countries. Previous research suggests that drug usage is related to population-level age demographics; self-reported drug usage tends to be greatest in youth populations (Health Canada, 2010). Consistent with the definition of youth used by the European Monitoring Centre for Drugs and Drug Addiction (EMCDDA), we chose to define youth as those aged 15 to 24. Accordingly, we hypothesized that differences in the relative proportion of youth (expressed as a percentage, PctYouth) would explain differences in drug usage between countries. Previous research also suggests that drug usage is related to population-level wealth demographics (Lipari and Park-Lee, 2019). We chose GDP as a measure of population wealth because it is widely referenced and compared in the literature (Nagelhout et al., 2017). Accordingly, we hypothesized that differences in Gross Domestic Product (GDP) would also explain differences in drug usage between countries.
This analysis is important because it helps contribute to our understanding of the social determinants of health. Our analysis will specifically help improve our understanding of the characteristics of populations that may correspond to increased drug usage.
In an attempt to improve the reliability of the data, only reference countries that had drug usage estimates available for at least six reference cities in the original drug usage data set were considered in our analysis. Thus, the reference countries for our analysis include: Austria, Belgium, Finland, Germany, Slovenia, Spain, Sweden, and Switzerland. The sample for the following analysis was constructed from the observations corresponding to the average weekday drug usage of 5 different types of reference drugs (amphetamines, canabis, cocaine, methamphetamine, and MDMA), measured across reference cities nested within six European reference countries, between the reference years 2011 and 2020. Average weekday usage was chosen because weekend usage data are unavailable for many of the observations in the original drug usage data set. Note that observations corresponding to a number of combinations of type of illicit reference drug, reference city, and reference year are also missing from the original data.
Throughout this document, bold text represents vocabulary terms that have been given specific, operational definitions for the context of the following analysis. These terms and definitions are provided below.
| Term | Definition |
|---|---|
| Drug Usage | Reported for each reference city and reference drug in mg/1000 people/day (EMCDDA, 2021). For additional information, see Appendix A. |
| Gross Domestic Product (GDP) | The sum of gross value added by all resident producers in the economy, plus any product taxes and minus any subsidies not included in the value of products, reported for each country of interest in 2010 United States Dollars (USD). Calculated without making deductions for depreciation of fabricated assets for depletion and degradation of natural resources (World Bank Data, 2021). For more information on why this metric was chosen, see Section 3.3. |
| PctYouth | The number of individuals in the population that fall within the age range 15-24. Reported for each country of interest as a percentage (UN Department of Economic and Social Affairs, 2019). For more information on why this metric was chosen, see Section 3.2. |
| Reference city | The cities across the 6 European countries that drug usage was estimated in. For additional information, see Appendix A. |
| Reference country | The 6 European countries (Austria, Belgium, Finland, Germany, Slovenia, Spain, Sweden, and Switzerland) corresponding to drug usage estimates used in the present analysis. For ease of recognition, the names of these reference countries are presented in italics throughout this document. For more information on how these countries were selected, see Section 1.2. |
| Reference drug | The 5 types of drugs (amphetamines, cannabis, cocaine, MDMA, methamphetamine) corresponding to drug usage estimates used in the present analysis. Throughout this document, the names of these illicit drugs are presented in italics for ease of recognition. For additional information on the types of illicit drugs considered in this analysis, see Appendix B. |
| Reference year | The 10 years (2011-2020) corresponding to drug usage estimates used in the present analysis. For additional information, see Appendix A. |
| Wastewater based epidemiology (WBE) | A chemical analysis technique used to detect compounds in the waters serviced by wastewater treatment facilities. For example, WBE is used in the present analysis to produce estimates of drug usage. For more information on WBE, see Appendix C. |
The following section provides a brief overview of the data acquisition for the drug usage, population age, and population wealth estimates used in our analysis. For more information, see the full project metadata in Appendix A.
Drug usage data was accessed through the European Monitoring Center for Drugs and Drug Addiction website (EMCDDA) (https://www.emcdda.europa.eu/publications/html/pods/waste-water-analysis_en#siteInfoTable). This data set looked at the usage of five common illicit drugs (amphetamines, cannabis, cocaine, methamphetamine, and MDMA), as measured in the wastewater of 137 cities across 29 European countries. The data were collected daily (in mg per 1000 people per day) over a span of 10 years (from 2011 to 2020). The data was averaged for each day of the week, the weekend, and the weekdays for each city over every year considered.
The subset of drug use data of interest is constructed from the mean weekday usage of each reference drug in each combination of reference city, nested within reference country, and reference year. For more information, including why weekday mean values were used and missing data in the original drug use data set, see Section 1.2.
Data on population age demographics were accessed from the United Nations Department of Economic and Social Affairs website (https://population.un.org/wpp/). Data were available from 235 countries and regions over a period of 60 years (from 1950-2020). These data show the proportion of the population accounted for by a number of pre-defined age groups (e.g. youth aged 15-24).
The subset of age data of interest was the proportion of the population aged 15-24 (PctYouth) corresponding to combinations of reference countries and reference years included in our study sample (Section 1.2).
Data on population wealth demographics were accessed via the World Bank website (https://data.worldbank.org/indicator/NY.GDP.MKTP.CD?fbclid=IwAR3tVOmvWo6ijRcavhXuUEhoDZgkVFE4DeiD1CtiZDYYiqpVk8sj0kSxxIY). This data set includes annual Gross Domestic Product (GDP; measured in the value of the 2010 US Dollar) for approximately 200 countries and regions between 1960 and 2020.
The subset of wealth data of interest was the annual Gross Domestic Product (GDP) corresponding to combinations of reference countries and reference years included in our study sample (Section 1.2).
In this section, we set the R Markdown Document (RMD) up for analysis. Setting the document up for analysis involved two steps, loading the dependency libraries, and importing data.
The first step of setting up our RMD was to load the dependency libraries. Descriptions and references for each of these packages are available in Appendix D.
library(ggplot2)
library(dplyr)
library(ggfortify)
library(readr)
library(tidyr)
library(stringr)
library(lubridate)
library(plotrix)
library(lattice)The second step of setting up our RMD was to import the raw data. The raw data files used in this RMD are available as a part of a GitHub Repository for this project (https://github.com/trevor-lyman/term.project).
First, we imported the drug usage data.
# step 1: read .csv files
# 2011
amphetamine_2011 <- read.csv("Data/Drug Metabolite Data/2011/WW-data-amphetamine-2011.csv")
cannabis_2011 <- read.csv("Data/Drug Metabolite Data/2011/WW-data-cannabis-2011.csv")
cocaine_2011 <- read.csv("Data/Drug Metabolite Data/2011/WW-data-cocaine-2011.csv")
MDMA_2011 <- read.csv("Data/Drug Metabolite Data/2011/WW-data-MDMA-2011.csv")
methamphetamine_2011 <- read.csv("Data/Drug Metabolite Data/2011/WW-data-methamphetamine-2011.csv")
# 2012
amphetamine_2012 <- read.csv("Data/Drug Metabolite Data/2012/WW-data-amphetamine-2012.csv")
cannabis_2012 <- read.csv("Data/Drug Metabolite Data/2012/WW-data-cannabis-2012.csv")
cocaine_2012 <- read.csv("Data/Drug Metabolite Data/2012/WW-data-cocaine-2012.csv")
MDMA_2012 <- read.csv("Data/Drug Metabolite Data/2012/WW-data-MDMA-2012.csv")
methamphetamine_2012 <- read.csv("Data/Drug Metabolite Data/2012/WW-data-methamphetamine-2012.csv")
# 2013 -- no cannabis data available
amphetamine_2013 <- read.csv("Data/Drug Metabolite Data/2013/WW-data-amphetamine-2013.csv")
cannabis_2013 <- read.csv("Data/Drug Metabolite Data/2013/WW-data-cannabis-2013.csv")
cocaine_2013 <- read.csv("Data/Drug Metabolite Data/2013/WW-data-cocaine-2013.csv")
MDMA_2013 <- read.csv("Data/Drug Metabolite Data/2013/WW-data-MDMA-2013.csv")
methamphetamine_2013 <- read.csv("Data/Drug Metabolite Data/2013/WW-data-methamphetamine-2013.csv")
# 2014 -- no cannabis data available
amphetamine_2014 <- read.csv("Data/Drug Metabolite Data/2014/WW-data-amphetamine-2014.csv")
cocaine_2014 <- read.csv("Data/Drug Metabolite Data/2014/WW-data-cocaine-2014.csv")
MDMA_2014 <- read.csv("Data/Drug Metabolite Data/2014/WW-data-MDMA-2014.csv")
methamphetamine_2014 <- read.csv("Data/Drug Metabolite Data/2014/WW-data-methamphetamine-2014.csv")
# 2015 -- no cannabis data available
amphetamine_2015 <- read.csv("Data/Drug Metabolite Data/2015/WW-data-amphetamine-2015.csv")
cocaine_2015 <- read.csv("Data/Drug Metabolite Data/2015/WW-data-cocaine-2015.csv")
MDMA_2015 <- read.csv("Data/Drug Metabolite Data/2015/WW-data-MDMA-2015.csv")
methamphetamine_2015 <- read.csv("Data/Drug Metabolite Data/2015/WW-data-methamphetamine-2015.csv")
# 2016 -- no cannabis data available
amphetamine_2016 <- read.csv("Data/Drug Metabolite Data/2016/WW-data-amphetamine-2016.csv")
cocaine_2016 <- read.csv("Data/Drug Metabolite Data/2016/WW-data-cocaine-2016.csv")
MDMA_2016 <- read.csv("Data/Drug Metabolite Data/2016/WW-data-MDMA-2016.csv")
methamphetamine_2016 <- read.csv("Data/Drug Metabolite Data/2016/WW-data-methamphetamine-2016.csv")
# 2017 -- no cannabis data available
amphetamine_2017 <- read.csv("Data/Drug Metabolite Data/2017/WW-data-amphetamine-2017.csv")
cocaine_2017 <- read.csv("Data/Drug Metabolite Data/2017/WW-data-cocaine-2017.csv")
MDMA_2017 <- read.csv("Data/Drug Metabolite Data/2017/WW-data-MDMA-2017.csv")
methamphetamine_2017 <- read.csv("Data/Drug Metabolite Data/2017/WW-data-methamphetamine-2017.csv")
# 2018
amphetamine_2018 <- read.csv("Data/Drug Metabolite Data/2018/WW-data-amphetamine-2018.csv")
cannabis_2018 <- read.csv("Data/Drug Metabolite Data/2018/WW-data-cannabis-2018.csv")
cocaine_2018 <- read.csv("Data/Drug Metabolite Data/2018/WW-data-cocaine-2018.csv")
MDMA_2018 <- read.csv("Data/Drug Metabolite Data/2018/WW-data-MDMA-2018.csv")
methamphetamine_2018 <- read.csv("Data/Drug Metabolite Data/2018/WW-data-methamphetamine-2018.csv")
# 2019
amphetamine_2019 <- read.csv("Data/Drug Metabolite Data/2019/WW-data-amphetamine-2019.csv")
cannabis_2019 <- read.csv("Data/Drug Metabolite Data/2019/WW-data-cannabis-2019.csv")
cocaine_2019 <- read.csv("Data/Drug Metabolite Data/2019/WW-data-cocaine-2019.csv")
MDMA_2019 <- read.csv("Data/Drug Metabolite Data/2019/WW-data-MDMA-2019.csv")
methamphetamine_2019 <- read.csv("Data/Drug Metabolite Data/2019/WW-data-methamphetamine-2019.csv")
# 2020
amphetamine_2020 <- read.csv("Data/Drug Metabolite Data/2020/WW-data-amphetamine-2020.csv")
cannabis_2020 <- read.csv("Data/Drug Metabolite Data/2020/WW-data-cannabis-2020.csv")
cocaine_2020 <- read.csv("Data/Drug Metabolite Data/2020/WW-data-cocaine-2020.csv")
MDMA_2020 <- read.csv("Data/Drug Metabolite Data/2020/WW-data-MDMA-2020.csv")
methamphetamine_2020 <- read.csv("Data/Drug Metabolite Data/2020/WW-data-methamphetamine-2020.csv")
# step 2: check data
head(amphetamine_2011); str(amphetamine_2011) # check the data## SiteID country city Wednesday Thursday Friday Saturday Sunday Monday
## 1 site14 BE Antwerp Zuid 229.63 211.52 318.89 318.79 311.03 272.62
## 2 site17 BE Brussels 22.92 28.77 39.31 57.57 44.62 32.38
## 3 site34 CZ Budweis 20.20 27.07 24.60 29.43 23.50 37.31
## 4 site55 ES Barcelona 7.40 12.14 13.21 25.29 19.50 15.94
## 5 site56 ES Castellon 0.00 0.00 0.00 0.00 0.00 0.00
## 6 site59 ES Santiago 33.99 35.46 41.20 40.04 30.08 NA
## Tuesday Weekday.mean Weekend.mean Daily.mean
## 1 277.78 239.64 305.33 277.18
## 2 22.18 24.63 43.47 35.39
## 3 37.43 28.23 28.71 28.51
## 4 14.41 11.31 18.48 15.41
## 5 0.00 0.00 0.00 0.00
## 6 22.29 30.58 37.11 33.84
## 'data.frame': 14 obs. of 13 variables:
## $ SiteID : chr "site14" "site17" "site34" "site55" ...
## $ country : chr "BE" "BE" "CZ" "ES" ...
## $ city : chr "Antwerp Zuid" "Brussels" "Budweis" "Barcelona" ...
## $ Wednesday : num 229.6 22.9 20.2 7.4 0 ...
## $ Thursday : num 211.5 28.8 27.1 12.1 0 ...
## $ Friday : num 318.9 39.3 24.6 13.2 0 ...
## $ Saturday : num 318.8 57.6 29.4 25.3 0 ...
## $ Sunday : num 311 44.6 23.5 19.5 0 ...
## $ Monday : num 272.6 32.4 37.3 15.9 0 ...
## $ Tuesday : num 277.8 22.2 37.4 14.4 0 ...
## $ Weekday.mean: num 239.6 24.6 28.2 11.3 0 ...
## $ Weekend.mean: num 305.3 43.5 28.7 18.5 0 ...
## $ Daily.mean : num 277.2 35.4 28.5 15.4 0 ...
Next, we imported the population age (PctYouth) data.
raw.PctYouth <- read.csv("Data/Age Data/raw.age.csv", skip = 16) # read .csv file
raw.PctYouth <- raw.PctYouth %>%
select(Region..subregion..country.or.area..,
Reference.date..as.of.1.July.,
Population.aged.15.24)
# select 3 variables of interest
head(raw.PctYouth); str(raw.PctYouth) # check data## Region..subregion..country.or.area.. Reference.date..as.of.1.July.
## 1 WORLD 1950
## 2 WORLD 1951
## 3 WORLD 1952
## 4 WORLD 1953
## 5 WORLD 1954
## 6 WORLD 1955
## Population.aged.15.24
## 1 18.2
## 2 18.1
## 3 18.0
## 4 17.8
## 5 17.7
## 6 17.6
## 'data.frame': 18105 obs. of 3 variables:
## $ Region..subregion..country.or.area..: chr "WORLD" "WORLD" "WORLD" "WORLD" ...
## $ Reference.date..as.of.1.July. : int 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 ...
## $ Population.aged.15.24 : chr "18.2" "18.1" "18.0" "17.8" ...
Finally, we imported the population wealth (GDP) data.
raw.gdp <- as.data.frame(na.omit(pivot_longer(data = read.csv("Data/GDP Data/raw.gdp.csv", skip = 4),
cols = 5:65, names_to = "year", values_to = "GDP")))
#read .csv file; use pivot_longer() to transform to long format
head(raw.gdp); str(raw.gdp) #check data## Country.Name Country.Code Indicator.Name Indicator.Code year GDP
## 1 Aruba ABW GDP (current US$) NY.GDP.MKTP.CD X1986 405463417
## 2 Aruba ABW GDP (current US$) NY.GDP.MKTP.CD X1987 487602458
## 3 Aruba ABW GDP (current US$) NY.GDP.MKTP.CD X1988 596423607
## 4 Aruba ABW GDP (current US$) NY.GDP.MKTP.CD X1989 695304363
## 5 Aruba ABW GDP (current US$) NY.GDP.MKTP.CD X1990 764887117
## 6 Aruba ABW GDP (current US$) NY.GDP.MKTP.CD X1991 872138715
## 'data.frame': 12762 obs. of 6 variables:
## $ Country.Name : chr "Aruba" "Aruba" "Aruba" "Aruba" ...
## $ Country.Code : chr "ABW" "ABW" "ABW" "ABW" ...
## $ Indicator.Name: chr "GDP (current US$)" "GDP (current US$)" "GDP (current US$)" "GDP (current US$)" ...
## $ Indicator.Code: chr "NY.GDP.MKTP.CD" "NY.GDP.MKTP.CD" "NY.GDP.MKTP.CD" "NY.GDP.MKTP.CD" ...
## $ year : chr "X1986" "X1987" "X1988" "X1989" ...
## $ GDP : num 4.05e+08 4.88e+08 5.96e+08 6.95e+08 7.65e+08 ...
## - attr(*, "na.action")= 'omit' Named int [1:3464] 1 2 3 4 5 6 7 8 9 10 ...
## ..- attr(*, "names")= chr [1:3464] "1" "2" "3" "4" ...
After our data was imported into the R environment, our next step was to restructure the data to prepare for analysis. This is often referred to as ‘tidying’ the data (Beckerman et al., 2017). The goal of this section was to build a long-wise table that presented the reference country, reference city, reference year, reference drug, and average weekday drug use for each observation in the sample.
First, we tidied the drug usage data. These data were imported as individual .csv files for each type of drug and reference year in Section 4.2; we wanted to merge these observations into a single dataframe that contained observations of the average weekday drug usage corresponding to each type of drug from each combination of reference country and reference year in the sample. Thus, we took a nested approach to tidying these data. Additionally, the names and structure of the variables in this dataframe needed to be corrected.
First, we tidied each of the data corresponding to observations of the use of a reference drug from a single reference year. We repeated this step for each reference drug for a single reference year.
#step 1: 2011 amphetamine
amphetamine_2011.tidy <- amphetamine_2011 %>% #create data frame
select(c(country, city, Weekday.mean)) %>%
# subset country, city, weekday mean concentration
mutate(year = c(rep(2011, length(amphetamine_2011$country)))) %>%
# add variable for reference year
mutate(drug = c(rep("amphetamine", length(amphetamine_2011$country))))
# add variable for type of drug usage
amphetamine_2011.tidy$country <- as.factor(amphetamine_2011.tidy$country)
# make country a factor
levels(amphetamine_2011.tidy$country) <- c(BE = "Belgium", CZ = "Czech Republic", ES = "Spain",
FR = "France", B = "Great Britain", HR = "Hungary",
IT = "Italy", NL = "Netherlands", NO = "Norway",
SE = "Sweden")
# change country codes to names of countries
head(amphetamine_2011.tidy); str(amphetamine_2011.tidy) # check the data## country city Weekday.mean year drug
## 1 Belgium Antwerp Zuid 239.64 2011 amphetamine
## 2 Belgium Brussels 24.63 2011 amphetamine
## 3 Czech Republic Budweis 28.23 2011 amphetamine
## 4 Spain Barcelona 11.31 2011 amphetamine
## 5 Spain Castellon 0.00 2011 amphetamine
## 6 Spain Santiago 30.58 2011 amphetamine
## 'data.frame': 14 obs. of 5 variables:
## $ country : Factor w/ 10 levels "Belgium","Czech Republic",..: 1 1 2 3 3 3 4 5 6 7 ...
## $ city : chr "Antwerp Zuid" "Brussels" "Budweis" "Barcelona" ...
## $ Weekday.mean: num 239.6 24.6 28.2 11.3 0 ...
## $ year : num 2011 2011 2011 2011 2011 ...
## $ drug : chr "amphetamine" "amphetamine" "amphetamine" "amphetamine" ...
Then, we used these data to build an annual drug usage summary. This is a long-wise table showing the types and amount of drug usage observed in each reference city in the given reference year.
# 2011 cannabis
cannabis_2011.tidy <- cannabis_2011 %>% # create data frame
select(c(country, city, Weekday.mean)) %>%
# subset country, city, weekday mean concentration
mutate(year = c(rep(2011, length(cannabis_2011$country)))) %>%
# add variable for reference year
mutate(drug = c(rep("cannabis", length(cannabis_2011$country))))
# add variable for type of drug usage
cannabis_2011.tidy$country <- as.factor(cannabis_2011.tidy$country)
# make country a factor
levels(cannabis_2011.tidy$country) <- c(BE = "Belgium", CZ = "Czech Republic", ES = "Spain",
FR = "France", HR = "Hungary", IT = "Italy",
NL = "Netherlands", SE = "Sweden")
# change country codes to names of countries
# 2011 cocaine
cocaine_2011.tidy <- cocaine_2011 %>% # create data frame
select(c(country, city, Weekday.mean)) %>%
# subset country, city, weekday mean concentration
mutate(year = c(rep(2011, length(cocaine_2011$country)))) %>%
# add variable for reference year
mutate(drug = c(rep("cocaine", length(cocaine_2011$country))))
# add variable for type of drug usage
cocaine_2011.tidy$country <- as.factor(cocaine_2011.tidy$country)
# make country a factor
levels(cocaine_2011.tidy$country) <- c(BE = "Belgium", CZ = "Czech Republic", ES = "Spain",
FR = "France", GB = "Great Britain", HR = "Hungary",
IT = "Italy", NL = "Netherlands", NO = "Norway", SE = "Sweden")
# change country codes to names of countries
# 2011 MDMA
MDMA_2011.tidy <- MDMA_2011 %>% # create data frame
select(c(country, city, Weekday.mean)) %>%
# subset country, city, weekday mean concentration
mutate(year = c(rep(2011, length(MDMA_2011$country)))) %>%
# add variable for reference year
mutate(drug = c(rep("MDMA", length(MDMA_2011$country))))
# add variable for type of drug metabolie
MDMA_2011.tidy$country <- as.factor(MDMA_2011.tidy$country)
# make country a factor
levels(MDMA_2011.tidy$country) <- c(BE = "Belgium", CZ = "Czech Republic", ES = "Spain", FR = "France",
GB = "Great Britain", HR = "Hungary", IT = "Italy",
NL = "Netherlands", NO = "Norway", SE = "Sweden")
# change country codes to names of countries
methamphetamine_2011.tidy.temp <- methamphetamine_2011 %>% # create data frame
select(c(country, city, methamphetamineWDMean2011)) %>%
# note that weekday mean concentration has a different name in this file than all others
mutate(Weekday.mean = methamphetamine_2011$methamphetamineWDMean2011) %>%
# fix name of weekday mean concentration to be consistent with others
mutate(year = c(rep(2011, length(methamphetamine_2011$country)))) %>%
# add variable for reference year
mutate(drug = c(rep("methamphetamine", length(methamphetamine_2011$country))))
# add variable for type of drug usage
methamphetamine_2011.tidy <- select(methamphetamine_2011.tidy.temp, -c(methamphetamineWDMean2011))
# remove weekday mean variable with incorrect name
methamphetamine_2011.tidy$country <- as.factor(methamphetamine_2011.tidy$country)
# make country a factor
levels(methamphetamine_2011.tidy$country) <- c(BE = "Belgium", CZ = "Czech Republic", ES = "Spain",
FR = "France", GB = "Great Britain", HR = "Hungary",
IT = "Italy", NL = "Netherlands", NO = "Norway",
SE = "Sweden")
# change country codes to names of countries
drug_2011 <- rbind(amphetamine_2011.tidy, cannabis_2011.tidy, cocaine_2011.tidy, MDMA_2011.tidy,
methamphetamine_2011.tidy) # use rbind() to merge all drug usages in a single dataframe
head(drug_2011); str(drug_2011) #check the data## country city Weekday.mean year drug
## 1 Belgium Antwerp Zuid 239.64 2011 amphetamine
## 2 Belgium Brussels 24.63 2011 amphetamine
## 3 Czech Republic Budweis 28.23 2011 amphetamine
## 4 Spain Barcelona 11.31 2011 amphetamine
## 5 Spain Castellon 0.00 2011 amphetamine
## 6 Spain Santiago 30.58 2011 amphetamine
## 'data.frame': 71 obs. of 5 variables:
## $ country : Factor w/ 10 levels "Belgium","Czech Republic",..: 1 1 2 3 3 3 4 5 6 7 ...
## $ city : chr "Antwerp Zuid" "Brussels" "Budweis" "Barcelona" ...
## $ Weekday.mean: num 239.6 24.6 28.2 11.3 0 ...
## $ year : num 2011 2011 2011 2011 2011 ...
## $ drug : chr "amphetamine" "amphetamine" "amphetamine" "amphetamine" ...
After repeating this process for each reference year, we merged these annual drug usage summaries into a long-wise table adding the variable reference year. Finally, we reformatted the data types in the final table.
# 2012
# amphetamine
amphetamine_2012.tidy <- amphetamine_2012 %>%
select(country, city, Weekday.mean) %>%
mutate(year = c(rep(2012, length(amphetamine_2012$country)))) %>%
mutate(drug = c(rep("amphetamine", length(amphetamine_2012$country))))
amphetamine_2012.tidy$country <- as.factor(amphetamine_2012.tidy$country)
levels(amphetamine_2012$country) <- c(BE = "Belgium", CH = "Switzerland", CZ = "Czech Republic",
ES = "Spain", FI = "Finland", FR = "France", GB = "Great Britain",
HR = "Hungary", IT = "Italy", NL = "Netherlands",
NO = "Norway", SE = "Sweden")
# cannabis
cannabis_2012.tidy <- cannabis_2012 %>%
select(country, city, Weekday.mean) %>%
mutate(year = c(rep(2012, length(cannabis_2012$country)))) %>%
mutate(drug = c(rep("cannabis", length(cannabis_2012$country))))
cannabis_2012.tidy$country <- as.factor(cannabis_2012.tidy$country)
levels(cannabis_2012.tidy$country) <- c(BE = "Belgium", CZ = "Czech Republic", ES = "Spain",
FR = "France", HR = "Hungary", IT = "Italy",
NL = "Netherlands", NO = "Norway", SE = "Sweden")
# cocaine
cocaine_2012.tidy <- cocaine_2012 %>%
select(country, city, Weekday.mean) %>%
mutate(year = c(rep(2012, length(cocaine_2012$country)))) %>%
mutate(drug = c(rep("cocaine", length(cocaine_2012$country))))
cocaine_2012.tidy$country <- as.factor(cocaine_2012.tidy$country)
levels(cocaine_2012.tidy$country) <- c(BE = "Belgium", CH = "Switzerland", CZ = "Czech Republic",
ES = "Spain", FI = "Finland", FR = "France", HR = "Hungary",
IT = "Italy", NL = "Netherlands", NO = "Norway", SE = "Sweden")
# MDMA
MDMA_2012.tidy <- MDMA_2012 %>%
select(country, city, Weekday.mean) %>%
mutate(year = c(rep(2012, length(MDMA_2012$country)))) %>%
mutate(drug = c(rep("MDMA", length(MDMA_2012$country))))
MDMA_2012.tidy$country <- as.factor(MDMA_2012.tidy$country)
levels(MDMA_2012.tidy$country) <- c(BE = "Belgium", CH = "Switzerland", CZ = "Czech Republic",
ES = "Spain", FI = "Finland", HR = "Hungary", IT = "Italy",
NL = "Netherlands", NO = "Norway", SE = "Sweden")
# methamphetamine
methamphetamine_2012.tidy <- methamphetamine_2012 %>%
select(country, city, Weekday.mean) %>%
mutate(year = c(rep(2012, length(methamphetamine_2012$country)))) %>%
mutate(drug = c(rep("methamphetamine", length(methamphetamine_2012$country))))
methamphetamine_2012.tidy$country <- as.factor(methamphetamine_2012.tidy$country)
levels(methamphetamine_2012.tidy$country) <- c(BE = "Belgium", CH = "Switzerland",
CZ = "Czech Republic", ES = "Spain", FR = "France",
HR = "Hungary", IT = "Italy", NL = "Netherlands",
NO = "Norway", SE = "Sweden")
# 2012 annual summary
drug_2012 <- rbind(amphetamine_2012.tidy, cannabis_2012.tidy, cocaine_2012.tidy, MDMA_2012.tidy,
methamphetamine_2012.tidy)
# 2013
# amphetamine
amphetamine_2013.tidy <- amphetamine_2013 %>%
select(country, city, Weekday.mean) %>%
mutate(year = c(rep(2013, length(amphetamine_2013$country)))) %>%
mutate(drug = c(rep("amphetamine", length(amphetamine_2013$country))))
amphetamine_2013.tidy$city <- as.factor(amphetamine_2013.tidy$city)
amphetamine_2013.tidy$country <- as.factor(amphetamine_2013.tidy$country)
levels(amphetamine_2013.tidy$country) <- c(BA = "Bosnia", BE = "Belgium", CH = "Switzerland",
CY = "Cyprus", CZ = "Czech Republic", DE = "Germany",
DK = "Denmark", ES = "Spain", FI = "Finland",
FR = "France", GB = "Great Britain", GR = "Greece",
HR = "Hungary", NL = "Netherlands", NO = "Norway",
PT = "Portugal", RO = "Romania", RS = "Serbia",
SE = "Sweden", SK = "Slovakia")
# cannabis
cannabis_2013.tidy <- cannabis_2013 %>%
select(country, city, Weekday.mean) %>%
mutate(year = c(rep(2013, length(cannabis_2013$country)))) %>%
mutate(drug = c(rep("cannabis", length(cannabis_2013$country))))
cannabis_2013.tidy$city <- as.factor(cannabis_2013.tidy$city)
cannabis_2013.tidy$country <- as.factor(cannabis_2013.tidy$country)
levels(cannabis_2013.tidy$country) <- c(BE = "Belgium", CH = "Switzerland", DE = "Germany",
DK = "Denmark", ES = "Spain", FR = "France", GR = "Greece",
HR = "Hungary", IT = "Italy", NL = "Netherlands",
NO = "Norway", PT = "Portugal")
# cocaine
cocaine_2013.tidy <- cocaine_2013 %>%
select(country, city, Weekday.mean) %>%
mutate(year = c(rep(2013, length(cocaine_2013$country)))) %>%
mutate(drug = c(rep("cocaine", length(cocaine_2013$country))))
cocaine_2013.tidy$city <- as.factor(cocaine_2013.tidy$city)
cocaine_2013.tidy$country <- as.factor(cocaine_2013.tidy$country)
levels(cocaine_2013.tidy$country) <- c(BA = "Bosnia", BE = "Belgium", CH = "Switzerland", CY = "Cyprus",
CZ = "Czech Republic", DE = "Germany", DK = "Denmark",
ES = "Spain", FR = "France", GB = "Great Britain", GR = "Greece",
HR = "Hungary", IT = "Italy", NL = "Netherlands", NO = "Norway",
PT = "Portugal", RO = "Romania", RS = "Serbia", SE = "Sweden",
SK = "Slovakia")
# MDMA
MDMA_2013.tidy <- MDMA_2013 %>%
select(country, city, Weekday.mean) %>%
mutate(year = c(rep(2013, length(MDMA_2013$country)))) %>%
mutate(drug = c(rep("MDMA", length(MDMA_2013$country))))
MDMA_2013.tidy$city <- as.factor(MDMA_2013.tidy$city)
MDMA_2013.tidy$country <- as.factor(MDMA_2013.tidy$country)
levels(MDMA_2013.tidy$country) <- c(BA = "Bosnia", BE = "Belgium", CH = "Switzerland", CY = "Cyprus",
CZ = "Czech Republic", DE = "Germany", DK = "Denmark",
ES = "Spain", FI = "Finland", FR = "France", GB = "Great Britain",
GR = "Greece", HR = "Hungary", IT = "Italy", NL = "Netherlands",
NO = "Norway", PT = "Portugal", RO = "Romania", RS = "Serbia",
SE = "Sweden", SK = "Slovakia")
# methamphetamine
methamphetamine_2013.tidy <- methamphetamine_2013 %>%
select(country, city, Weekday.mean) %>%
mutate(year = c(rep(2013, length(methamphetamine_2013$country)))) %>%
mutate(drug = c(rep("methamphetamine", length(methamphetamine_2013$country))))
methamphetamine_2013.tidy$city <- as.factor(methamphetamine_2013.tidy$city)
methamphetamine_2013.tidy$country <- as.factor(methamphetamine_2013.tidy$country)
levels(methamphetamine_2013.tidy$country) <- c(BA = "Bosnia", BE = "Belgium", CH = "Switzerland",
CY = "Cyprus", CZ = "Czech Republic", DE = "Germany",
DK = "Denmark", ES = "Spain", FR = "France",
GB = "Great Britain", GF = "Greece", IT = "Italy",
NL = "Netherlands", NO = "Norway", PT = "Portugal",
RO = "Romania", RS = "Serbia", SE = "Sweden",
SK = "Slovakia")
# 2013 annual summary
drug_2013 <- rbind(amphetamine_2013.tidy, cannabis_2013.tidy, cocaine_2013.tidy, MDMA_2013.tidy,
methamphetamine_2013.tidy)
# 2014
# amphetamine
amphetamine_2014.tidy <- amphetamine_2014 %>%
select(country, city, Weekday.mean) %>%
mutate(year = c(rep(2014, length(amphetamine_2014$country)))) %>%
mutate(drug = c(rep("amphetamine", length(amphetamine_2014$country))))
amphetamine_2014.tidy$country <- as.factor(amphetamine_2014.tidy$country)
levels(amphetamine_2014.tidy$country) <- c(BE = "Belgium", CH = "Switzerland", DE = "Germany",
ES = "Spain", FI = "Finland", FR = "France",
GB = "Great Britain", GR = "Greece", HR = "Hungary",
IT = "Italy", NL = "Netherlands", PT = "Portugal")
# cocaine
cocaine_2014.tidy <- cocaine_2014 %>%
select(country, city, Weekday.mean) %>%
mutate(year = c(rep(2014, length(cocaine_2014$country)))) %>%
mutate(drug = c(rep("cocaine", length(cocaine_2014$country))))
cocaine_2014.tidy$country <- as.factor(cocaine_2014.tidy$country)
levels(cocaine_2014.tidy$country) <- c(BE = "Belgium", CH = "Switzerland", DE = "Germany",
DK = "Denmark", ES = "Spain", FI = "Finland",
FR = "France", GB = "Great Britain", GR = "Greece",
HR = "Hungary", IT = "Italy", NL = "Netherlands",
NO = "Norway", PT = "Portugal")
# MDMA
MDMA_2014.tidy <- MDMA_2014 %>%
select(country, city, Weekday.mean) %>%
mutate(year = c(rep(2014, length(MDMA_2014$country)))) %>%
mutate(drug = c(rep("MDMA", length(MDMA_2014$country))))
MDMA_2014.tidy$city <- as.factor(MDMA_2014.tidy$city)
MDMA_2014.tidy$country <- as.factor(MDMA_2014.tidy$country)
levels(MDMA_2014.tidy$country) <- c(BE = "Belgium", CH = "Switzerland", DE = "Germany", DK = "Denmark",
ES = "Spain", FI = "Finland", FR = "France", GB = "Great Britain",
GR = "Greece", HR = "Hungary", IT = "Italy", NL = "Netherlands",
NO = "Norway", PT = "Portugal")
# methamphetamine
methamphetamine_2014.tidy <- methamphetamine_2014 %>%
select(country, city, Weekday.mean) %>%
mutate(year = c(rep(2014, length(methamphetamine_2014$country)))) %>%
mutate(drug = c(rep("methamphetamine", length(methamphetamine_2014$country))))
methamphetamine_2014.tidy$city <- as.factor(methamphetamine_2014.tidy$city)
methamphetamine_2014.tidy$country <- as.factor(methamphetamine_2014.tidy$country)
levels(methamphetamine_2014.tidy$country) <- c(BE = "Belgium", CH = "Switzerland", DE = "Germany",
DK = "Denmark", ES = "Spain", FI = "Finland",
FR = "France", GB = "Great Britain", GR = "Greece",
HR = "Hungary", IT = "Italy", NL = "Netherlands",
NO = "Norway", PT = "Portugal")
# 2014 annual summary
drug_2014 <- rbind(amphetamine_2014.tidy, cocaine_2014.tidy, MDMA_2014.tidy, methamphetamine_2014.tidy)
# 2015
# amphetamine
amphetamine_2015.tidy <- amphetamine_2015 %>%
select(country, city, Weekday.mean) %>%
mutate(year = c(rep(2015, length(amphetamine_2015$country)))) %>%
mutate(drug = c(rep("amphetamine", length(amphetamine_2015$country))))
amphetamine_2015.tidy$country <- as.factor(amphetamine_2015.tidy$country)
levels(amphetamine_2015.tidy$country) <- c(AT = "Austria", BE = "Belgium", CH = "Switzerland",
CY = "Cyprus", CZ = "Czech Republic", DE = "Germany",
DK = "Denmark", ES = "Spain", FI = "Finland",
FR = "France", GB = "Great Britain", GR = "Greece",
HR = "Hungary", IS = "Iceland", IT = "Italy",
MT = "Malta", NL = "Netherlands", NO = "Norway",
PT = "Portugal", SK = "Slovakia")
# cocaine
cocaine_2015.tidy <- cocaine_2015 %>%
select(country, city, Weekday.mean) %>%
mutate(year = c(rep(2015, length(cocaine_2015$country)))) %>%
mutate(drug = c(rep("cocaine", length(cocaine_2015$country))))
cocaine_2015.tidy$country <- as.factor(cocaine_2015.tidy$country)
levels(cocaine_2015.tidy$country) <- c(AT = "Austria", BE = "Belgium", CH = "Switzerland",
CY = "Cyprus", CZ = "Czech Republic", DE = "Germany",
DK = "Denmark", ES = "Spain", FI = "Finland", FR = "France",
GB = "Great Britain", GR = "Greece", HR = "Hungary",
IS = "Iceland", IT = "Italy", MT = "Malta", NL = "Netherlands",
NO = "Norway", PT = "Portugal", SK = "Slovakia")
# MDMA
MDMA_2015.tidy <- MDMA_2015 %>%
select(country, city, Weekday.mean) %>%
mutate(year = c(rep(2015, length(MDMA_2015$country)))) %>%
mutate(drug = c(rep("MDMA", length(MDMA_2015$country))))
MDMA_2015.tidy$country <- as.factor(MDMA_2015.tidy$country)
levels(MDMA_2015.tidy$country) <- c(AT = "Austria", BE = "Belgium", CH = "Switzerland", CY = "Cyprus",
CZ = "Czech Republic", DE = "Germany", DK = "Denmark", ES = "Spain",
FI = "Finland", FR = "France", GB = "Great Britain", GR = "Greece",
HR = "Hungary", IT = "Italy", NL = "Netherlands", NO = "Norway",
PT = "Portugal", SK = "Slovakia")
# methamphetamine
methamphetamine_2015.tidy <- methamphetamine_2015 %>%
select(country, city, Weekday.mean) %>%
mutate(year = c(rep(2015, length(methamphetamine_2015$country)))) %>%
mutate(drug = c(rep("methamphetamine", length(methamphetamine_2015$country))))
methamphetamine_2015.tidy$country <- as.factor(methamphetamine_2015.tidy$country)
levels(methamphetamine_2015.tidy$country) <- c(AT = "Austria", BE = "Belgium", CH = "Switzerland",
CY = "Cyprus", CZ = "Czech Republic", DE = "Germany",
DK = "Denmark", ES = "Spain", FI = "Finland",
FR = "France", GB = "Great Britain", GR = "Greece",
HR = "Hungary", IS = "Iceland", IT = "Italy",
MT = "Malta", NL = "Netherlands", NO = "Norway",
PT = "Portugal", SK = "Slovakia")
# 2015 annual summary
drug_2015 <- rbind(amphetamine_2015.tidy, cocaine_2015.tidy, MDMA_2015.tidy, methamphetamine_2015.tidy)
# 2016
# amphetamine
amphetamine_2016.tidy <- amphetamine_2016 %>%
select(country, city, Weekday.mean) %>%
mutate(year = c(rep(2016, length(amphetamine_2016$country)))) %>%
mutate(drug = c(rep("amphetamine", length(amphetamine_2016$country))))
amphetamine_2016.tidy$country <- as.factor(amphetamine_2016.tidy$country)
levels(amphetamine_2016.tidy$country) <- c(AT = "Austria", BE = "Belgium", CH = "Switzerland",
CY = "Cyprus", CZ = "Czech Republic", DE = "Germany",
ES = "Spain", FI = "Finland", FR = "France",
GB = "Great Britain", GR = "Greece", HR = "Hungary",
IS = "Iceland", IT = "Italy", NE = "Netherlands",
NL = "Netherlands", NO = "Norway", PT = "Portugal",
SE = "Sweden", SK = "Slovakia")
# cocaine
cocaine_2016.tidy <- cocaine_2016 %>%
select(country, city, Weekday.mean) %>%
mutate(year = c(rep(2016, length(cocaine_2016$country)))) %>%
mutate(drug = c(rep("cocaine", length(cocaine_2016$country))))
cocaine_2016.tidy$country <- as.factor(cocaine_2016.tidy$country)
levels(cocaine_2016.tidy$country) <- c(AT = "Austria", BE = "Belgium", CH = "Switzerland",
CY = "Cyprus", CZ = "Czech Republic", DE = "Germany",
ES = "Spain", FI = "Finland", FR = "France",
GB = "Great Britain", GR = "Greece", HR = "Hungary",
IS = "Iceland", IT = "Italy", NL = "Netherlands", NO = "Norway",
PT = "Portugal", SE = "Sweden", SK = "Slovakia")
# MDMA
MDMA_2016.tidy <- MDMA_2016 %>%
select(country, city, Weekday.mean) %>%
mutate(year = c(rep(2016, length(MDMA_2016$country)))) %>%
mutate(drug = c(rep("MDMA", length(MDMA_2016$country))))
MDMA_2016.tidy$country <- as.factor(MDMA_2016.tidy$country)
levels(MDMA_2016.tidy$country) <- c(AT = "Austria", BE = "Belgium", CH = "Switzerland", CY = "Cyprus",
CZ = "Czech Republic", DE = "Germany", ES = "Spain", FI = "Finland",
FR = "France", GB = "Great Britain", GR = "Greece", HR = "Hungary",
IS = "Iceland", IT = "Italy", NL = "Netherlands", NO = "Norway",
PT = "Portugal", SE = "Sweden", SK = "Slovakia")
# methamphetamine
methamphetamine_2016.tidy <- methamphetamine_2016 %>%
select(country, city, Weekday.mean) %>%
mutate(year = c(rep(2016, length(methamphetamine_2016$country)))) %>%
mutate(drug = c(rep("methamphetamine", length(methamphetamine_2016$country))))
methamphetamine_2016.tidy$country <- as.factor(methamphetamine_2016.tidy$country)
levels(methamphetamine_2016.tidy$country) <- c(AT = "Austria", BE = "Belgium", CH = "Switzerland",
CY = "Cyprus", CZ = "Czech Republic", DE = "Germany",
ES = "Spain", FI = "Finland", FR = "France",
GB = "Great Britain", GR = "Greece", HR = "Hungary",
IS = "Iceland", IT = "Italy", NL = "Netherlands",
NO = "Norway", PT = "Portugal", SE = "Sweden",
SK = "Slovakia")
# 2016 annual summary
drug_2016 <- rbind(amphetamine_2016.tidy, cocaine_2016.tidy, MDMA_2016.tidy, methamphetamine_2016.tidy)
# 2017
# amphetamine
amphetamine_2017.tidy <- amphetamine_2017 %>%
select(country, city, Weekday.mean) %>%
mutate(year = c(rep(2017, length(amphetamine_2017$country)))) %>%
mutate(drug = c(rep("amphetamine", length(amphetamine_2017$country))))
amphetamine_2017.tidy$country <- as.factor(amphetamine_2017.tidy$country)
levels(amphetamine_2017.tidy$country) <- c(AT = "Austria", BE = "Belgium", CH = "Switzerland",
CY = "Cyprus", CZ = "Czech Republic", DE = "Germany",
ES = "Spain", FI = "Finland", FR = "France",
GB = "Great Britain", GR = "Greece", HR = "Hungary",
IS = "Iceland", IT = "Italy", LT = "Lithuania",
NL = "Netherlands", NO = "Norway", PL = "Poland",
PT = "Portugal", SI = "Slovenia", SK = "Slovakia")
# cocaine
cocaine_2017.tidy <- cocaine_2017 %>%
select(country, city, Weekday.mean) %>%
mutate(year = c(rep(2017, length(cocaine_2017$country)))) %>%
mutate(drug = c(rep("cocaine", length(cocaine_2017$country))))
cocaine_2017.tidy$country <- as.factor(cocaine_2017.tidy$country)
levels(cocaine_2017.tidy$country) <- c(AT = "Austria", BE = "Belgium", CH = "Switzerland",
CZ = "Czech Republic", DE = "Germany", ES = "Spain",
FI = "Finland", FR = "France", GB = "Great Britain",
GR = "Greece", HR = "Hungary", IS = "Iceland", IT = "Italy",
LT = "Lithuania", NL = "Netherlands", NO = "Norway",
PT = "Portugal", SI = "Slovenia", SK = "Slovakia")
# MDMA
MDMA_2017.tidy <- MDMA_2017 %>%
select(country, city, Weekday.mean) %>%
mutate(year = c(rep(2017, length(MDMA_2017$country)))) %>%
mutate(drug = c(rep("MDMA", length(MDMA_2017$country))))
MDMA_2017.tidy$country <- as.factor(MDMA_2017.tidy$country)
levels(MDMA_2017.tidy$country) <- c(AT = "Austria", BE = "Belgium", CH = "Switzerland", CY = "Cyprus",
CZ = "Czech Republic", DE = "Germany", ES = "Spain", FI = "Finland",
FR = "France", GB = "Great Britain", GR = "Greece", HR = "Hungary",
IS = "Iceland", IT = "Italy", LT = "Lithuania", NL = "Netherlands",
NO = "Norway", PL = "Poland", PT = "Portugal", SI = "Slovenia",
SK = "Slovakia")
# methamphetamine
methamphetamine_2017.tidy <- methamphetamine_2017 %>%
select(country, city, Weekday.mean) %>%
mutate(year = c(rep(2017, length(methamphetamine_2017$country)))) %>%
mutate(drug = c(rep("methamphetamine", length(methamphetamine_2017$country))))
methamphetamine_2017.tidy$country <- as.factor(methamphetamine_2017.tidy$country)
levels(methamphetamine_2017.tidy$country) <- c(AT = "Austria", BE = "Belgium", CH = "Switzerland",
CY = "Cyprus", CZ = "Czech Republic", DE = "Germany",
ES = "Spain", FI = "Finland", FR = "France",
GB = "Great Britain", GR = "Greece", HR = "Hungary",
IT = "Italy", LT = "Lithuania", NL = "Netherlands",
NO = "Norway", PL = "Poland", PT = "Portugal",
SI = "Slovenia", SK = "Slovakia")
# 2017 annual summary
drug_2017 <- rbind(amphetamine_2017.tidy, cocaine_2017.tidy, MDMA_2017.tidy, methamphetamine_2017.tidy)
# 2018
# amphetamine
amphetamine_2018.tidy <- amphetamine_2018 %>%
select(country, city, Weekday.mean) %>%
mutate(year = c(rep(2018, length(amphetamine_2018$country)))) %>%
mutate(drug = c(rep("amphetamine", length(amphetamine_2018$country))))
amphetamine_2018.tidy$country <- as.factor(amphetamine_2018.tidy$country)
levels(amphetamine_2018.tidy$country) <- c(AT = "Austria", BE = "Belgium", CH = "Switzerland",
CY = "Cyprus", CZ = "Czech Republic", DE = "Germany",
DK = "Denmark", ES = "Spain", FI = "Finland", FR = "France",
GR = "Greece", HR = "Hungary", IS = "Iceland", IT = "Italy",
LT = "Lithuania", NL = "Netherlands", NO = "Norway",
PT = "Portugal", SI = "Slovenia", SK = "Slovakia",
TR = "Turkey")
# cannabis
cannabis_2018.tidy <- cannabis_2018 %>%
select(country, city, Weekday.mean) %>%
mutate(year = c(rep(2018, length(cannabis_2018$country)))) %>%
mutate(drug = c(rep("cannabis", length(cannabis_2018$country))))
cannabis_2018.tidy$country <- as.factor(cannabis_2018.tidy$country)
levels(cannabis_2018.tidy$country) <- c(AT = "Austria", CZ = "Czech Republic", ES = "Spain",
FR = "France", GR = "Greece", HR = "Hungary", IS = "Iceland",
IT = "Italy", NL = "Netherlands", PT = "Portugal",
SI = "Slovenia", SK = "Slovakia")
# cocaine
cocaine_2018.tidy <- cocaine_2018 %>%
select(country, city, Weekday.mean) %>%
mutate(year = c(rep(2018, length(cocaine_2018$country)))) %>%
mutate(drug = c(rep("cocaine", length(cocaine_2018$country))))
cocaine_2018.tidy$country <- as.factor(cocaine_2018.tidy$country)
levels(cocaine_2018.tidy$country) <- c(AT = "Austria", BE = "Belgium", CH = "Switzerland",
CZ = "Czech Republic", DE = "Germany", DK = "Denmark",
ES = "Spain", FI = "Finland", FR = "France",
GB = "Great Britain", HR = "Hungary", IS = "Iceland",
IT = "Italy", LT = "Lithuania", NL = "Netherlands",
NO = "Norway", PT = "Portugal", SI = "Slovenia",
SK = "Slovakia")
# MDMA
MDMA_2018.tidy <- MDMA_2018 %>%
select(country, city, Weekday.mean) %>%
mutate(year = c(rep(2018, length(MDMA_2018$country)))) %>%
mutate(drug = c(rep("MDMA", length(MDMA_2018$country))))
MDMA_2018.tidy$country <- as.factor(MDMA_2018.tidy$country)
levels(MDMA_2018.tidy$country) <- c(AT = "Austria", BE = "Belgium", CH = "Switzerland", CY = "Cyprus",
CZ = "Czech Republic", DE = "Germany", DK = "Denmark", ES = "Spain",
FI = "Finland", FR = "France", GR = "Greece", HR = "Hungary",
IS = "Iceland", IT = "Italy", LT = "Lithuania", NL = "Netherlands",
NO = "Norway", PT = "Portugal", SI = "Slovenia", SK = "Slovakia")
# methamphetamine
methamphetamine_2018.tidy <- methamphetamine_2018 %>%
select(country, city, Weekday.mean) %>%
mutate(year = c(rep(2018, length(methamphetamine_2018$country)))) %>%
mutate(drug = c(rep("methamphetamine", length(methamphetamine_2018$country))))
methamphetamine_2018.tidy$country <- as.factor(methamphetamine_2018.tidy$country)
levels(methamphetamine_2018.tidy$country) <- c(AT = "Austria", BE = "Belgium", CY = "Cyprus",
CZ = "Czech Republic", DE = "Germany", DK = "Denmark",
ES = "Spain", FI = "Finland", FR = "France",
GB = "Great Britain", HR = "Hungary", IS = "Iceland",
IT = "Italy", LT = "Lithuania", NL = "Netherlands",
NO = "Norway", PT = "Portugal", SI = "Slovenia",
SK = "Slovakia", TR = "Turkey")
# 2018 annual summary
drug_2018 <- rbind(amphetamine_2018.tidy, cannabis_2018.tidy, cocaine_2018.tidy, MDMA_2018.tidy,
methamphetamine_2018.tidy)
# 2019
# amphetamine
amphetamine_2019.tidy <- amphetamine_2019 %>%
select(country, city, Weekday.mean) %>%
mutate(year = c(rep(2019, length(amphetamine_2019$country)))) %>%
mutate(drug = c(rep("amphetamine", length(amphetamine_2019$country))))
amphetamine_2019.tidy$country <- as.factor(amphetamine_2019.tidy$country)
levels(amphetamine_2019.tidy$country) <- c(AT = "Austria", BE = "Belgium", CH = "Switzerland",
CY = "Cyprus", CZ = "Czech Republic", DE = "Germany",
DK = "Denmark", ES = "Spain", FI = "Finland", FR = "France",
HR = "Hungary", IS = "Iceland", IT = "Italy",
LT = "Lithuania", LV = "Latvia", NL = "Netherlands",
NO = "Norway", PL = "Poland", PT = "Portugal", SE = "Sweden",
SI = "Slovenia", TR = "Turkey")
# cannabis
cannabis_2019.tidy <- cannabis_2019 %>%
select(country, city, Weekday.mean) %>%
mutate(year = c(rep(2019, length(cannabis_2019$country)))) %>%
mutate(drug = c(rep("cannabis", length(cannabis_2019$country))))
cannabis_2019.tidy$country <- as.factor(cannabis_2019.tidy$country)
levels(cannabis_2019.tidy$country) <- c(AT = "Austria", CH = "Switzerland", CZ = "Czech Republic", ES = "Spain",
FR = "France", GR = "Greece", HR = "Hungary", IS = "Iceland",
IT = "Italy", NL = "Netherlands", PL = "Poland", PT = "Portugal",
SE = "Sweden", SI = "Slovenia", SK = "Slovakia", TR = "Turkey")
# cocaine
cocaine_2019.tidy <- cocaine_2019 %>%
select(country, city, Weekday.mean) %>%
mutate(year = c(rep(2019, length(cocaine_2019$country)))) %>%
mutate(drug = c(rep("cocaine", length(cocaine_2019$country))))
cocaine_2019.tidy$country <- as.factor(cocaine_2019.tidy$country)
levels(cocaine_2019.tidy$country) <- c(AT = "Austria", BE = "Belgium", CH = "Switzerland", CY = "Cyprus",
CZ = "Czech Republic", DE = "Germany", DK = "Denmark",
ES = "Spain", FI = "Finland", FR = "France",
GB = "Great Britain", GR = "Greece", HR = "Hungary", IS = "Iceland",
IT = "Italy", LT = "Lithuania", LV = "Latvia", NL = "Netherlands",
NO = "Norway", PL = "Poland", PT = "Portugal", SE = "Sweden",
SI = "Slovenia", SK = "Slovakia", TR = "Turkey")
# MDMA
MDMA_2019.tidy <- MDMA_2019 %>%
select(country, city, Weekday.mean) %>%
mutate(year = c(rep(2019, length(MDMA_2019$country)))) %>%
mutate(drug = c(rep("MDMA", length(MDMA_2019$country))))
MDMA_2019.tidy$country <- as.factor(MDMA_2019.tidy$country)
levels(MDMA_2019.tidy$country) <- c(AT = "Austria", BE = "Belgium", CH = "Switzerland", CY = "Cyprus",
CZ = "Czech Republic", DE = "Germany", DK = "Denmark", ES = "Spain",
FI = "Finland", FR = "France", GB = "Great Britain", GR = "Greece",
HR = "Hungary", IS = "Iceland", IT = "Italy", LT = "Lithuania",
LV = "Latvia", NL = "Netherlands", NO = "Norway", PL = "Poland",
PT = "Portugal", SE = "Sweden", SI = "Slovenia", TR = "Turkey")
# methamphetamine
methamphetamine_2019.tidy <- methamphetamine_2019 %>%
select(country, city, Weekday.mean) %>%
mutate(year = c(rep(2019, length(methamphetamine_2019$country)))) %>%
mutate(drug = c(rep("methamphetamine", length(methamphetamine_2019$country))))
methamphetamine_2019.tidy$country <- as.factor(methamphetamine_2019.tidy$country)
levels(methamphetamine_2019.tidy$country) <- c(AT = "Austria", BE = "Belgium", CH = "Switzerland",
CY = "Cyprus", CZ = "Czech Republic", DE = "Germany",
DK = "Denmark", ES = "Spain", FI = "Finland", FR = "France",
GB = "Great Britain", GR = "Greece", HR = "Hungary",
IS = "Iceland", IT = "Italy", LT = "Lithuania", LV = "Latvia",
NL = "Netherlands", NO = "Norway", PL = "Poland", PT = "Portugal",
SE = "Sweden", SI = "Slovenia", TR = "Turkey")
# 2019 annual summary
drug_2019 <- rbind(amphetamine_2019.tidy, cannabis_2019.tidy, cocaine_2019.tidy, MDMA_2019.tidy,
methamphetamine_2019.tidy)
# 2020
# amphetamine
amphetamine_2020.tidy <- amphetamine_2020 %>%
select(country, city, Weekday.mean) %>%
mutate(year = c(rep(2020, length(amphetamine_2020$country)))) %>%
mutate(drug = c(rep("amphetamine", length(amphetamine_2020$country))))
amphetamine_2020.tidy$country <- as.factor(amphetamine_2020.tidy$country)
levels(amphetamine_2020.tidy$country) <- c(AT = "Austria", BE = "Belgium", CH = "Switzerland",
CY = "Cyprus", CZ = "Czech Republic", DE = "Germany",
DK = "Denmark", ES = "Spain", FI = "Finland", FR = "France",
HR = "Hungary", IS = "Iceland", IT = "Italy",
LT = "Lithuania", LV = "Latvia", NL = "Netherlands",
NO = "Norway", PL = "Poland", PT = "Portugal", SE = "Sweden",
SI = "Slovenia", TR = "Turkey")
# cannabis
cannabis_2020.tidy <- cannabis_2020 %>%
select(country, city, Weekday.mean) %>%
mutate(year = c(rep(2020, length(cannabis_2020$country)))) %>%
mutate(drug = c(rep("cannabis", length(cannabis_2020$country))))
cannabis_2020.tidy$country <- as.factor(cannabis_2020.tidy$country)
levels(cannabis_2020.tidy$country) <- c(AT = "Austria", CH = "Switzerland", CZ = "Czech Republic", ES = "Spain",
FR = "France", GR = "Greece", HR = "Hungary", IS = "Iceland",
IT = "Italy", NL = "Netherlands", PL = "Poland", PT = "Portugal",
SE = "Sweden", SI = "Slovenia", SK = "Slovakia", TR = "Turkey")
# cocaine
cocaine_2020.tidy <- cocaine_2020 %>%
select(country, city, Weekday.mean) %>%
mutate(year = c(rep(2020, length(cocaine_2020$country)))) %>%
mutate(drug = c(rep("cocaine", length(cocaine_2020$country))))
cocaine_2020.tidy$country <- as.factor(cocaine_2020.tidy$country)
levels(cocaine_2020.tidy$country) <- c(AT = "Austria", BE = "Belgium", CH = "Switzerland", CY = "Cyprus",
CZ = "Czech Republic", DE = "Germany", DK = "Denmark",
ES = "Spain", FI = "Finland", FR = "France",
GB = "Great Britain", GR = "Greece", HR = "Hungary", IS = "Iceland",
IT = "Italy", LT = "Lithuania", LV = "Latvia", NL = "Netherlands",
NO = "Norway", PL = "Poland", PT = "Portugal", SE = "Sweden",
SI = "Slovenia", SK = "Slovakia", TR = "Turkey")
# MDMA
MDMA_2020.tidy <- MDMA_2020 %>%
select(country, city, Weekday.mean) %>%
mutate(year = c(rep(2020, length(MDMA_2020$country)))) %>%
mutate(drug = c(rep("MDMA", length(MDMA_2020$country))))
MDMA_2020.tidy$country <- as.factor(MDMA_2020.tidy$country)
levels(MDMA_2020.tidy$country) <- c(AT = "Austria", BE = "Belgium", CH = "Switzerland", CY = "Cyprus",
CZ = "Czech Republic", DE = "Germany", DK = "Denmark", ES = "Spain",
FI = "Finland", FR = "France", GB = "Great Britain", GR = "Greece",
HR = "Hungary", IS = "Iceland", IT = "Italy", LT = "Lithuania",
LV = "Latvia", NL = "Netherlands", NO = "Norway", PL = "Poland",
PT = "Portugal", SE = "Sweden", SI = "Slovenia", TR = "Turkey")
# methamphetamine
methamphetamine_2020.tidy <- methamphetamine_2020 %>%
select(country, city, Weekday.mean) %>%
mutate(year = c(rep(2020, length(methamphetamine_2020$country)))) %>%
mutate(drug = c(rep("methamphetamine", length(methamphetamine_2020$country))))
methamphetamine_2020.tidy$country <- as.factor(methamphetamine_2020.tidy$country)
levels(methamphetamine_2020.tidy$country) <- c(AT = "Austria", BE = "Belgium", CH = "Switzerland",
CY = "Cyprus", CZ = "Czech Republic", DE = "Germany",
DK = "Denmark", ES = "Spain", FI = "Finland", FR = "France",
GB = "Great Britain", GR = "Greece", HR = "Hungary",
IS = "Iceland", IT = "Italy", LT = "Lithuania", LV = "Latvia",
NL = "Netherlands", NO = "Norway", PL = "Poland", PT = "Portugal",
SE = "Sweden", SI = "Slovenia", TR = "Turkey")
# 2020 annual summary
drug_2020 <- rbind(amphetamine_2020.tidy, cannabis_2020.tidy, cocaine_2020.tidy, MDMA_2020.tidy,
methamphetamine_2020.tidy)
# total summary
drug <- rbind(drug_2011, drug_2012, drug_2013, drug_2014, drug_2015, drug_2016, drug_2017, drug_2018,
drug_2019, drug_2020)
# merge with rbind()
drug.tidy <- drug[drug$country == "Austria" | drug$country == "Belgium" | drug$country == "Finland" |
drug$country == "Germany" | drug$country == "Sweden" | drug$country == "Switzerland" |
drug$country == "Spain" | drug$country == "Slovenia",]
# subset 8 countries of interest
drug.tidy$year <- as.integer(drug.tidy$year) # transform year as integer
drug.tidy$drug <- as.factor(drug.tidy$drug) # transform drug as factor
head(drug.tidy); str(drug.tidy) # check the data## country city Weekday.mean year drug
## 1 Belgium Antwerp Zuid 239.64 2011 amphetamine
## 2 Belgium Brussels 24.63 2011 amphetamine
## 4 Spain Barcelona 11.31 2011 amphetamine
## 5 Spain Castellon 0.00 2011 amphetamine
## 6 Spain Santiago 30.58 2011 amphetamine
## 14 Sweden Umea 16.57 2011 amphetamine
## 'data.frame': 1401 obs. of 5 variables:
## $ country : Factor w/ 40 levels "Belgium","Czech Republic",..: 1 1 3 3 3 10 1 3 3 3 ...
## $ city : chr "Antwerp Zuid" "Brussels" "Barcelona" "Castellon" ...
## $ Weekday.mean: num 239.6 24.6 11.3 0 30.6 ...
## $ year : int 2011 2011 2011 2011 2011 2011 2011 2011 2011 2011 ...
## $ drug : Factor w/ 5 levels "amphetamine",..: 1 1 1 1 1 1 2 2 2 2 ...
Next, we tidied the age data. These data were already formatted to include the variables we were interested in: reference country, reference date, and age. However, we needed to rename these data to be consistent. Additionally, these variables were not formatted correctly: reference country should be a factor, reference date should be an integer, and age should be numeric (i.e. quantitative, continuous). Finally, these data also include observations that were outside the scope of the present study.
Thus, we tidied the age data in three steps. First, we renamed the variables; next we reformatted the data types; and finally we removed the observations outside the scope of our study.
# step 1: rename variables
raw.PctYouth.tidy <- raw.PctYouth # create dataframe
colnames(raw.PctYouth.tidy) <- c("country", "year", "PctYouth") # fix col names
# age = percentage of population between the ages of 15 and 24
# step 2: reformat data
raw.PctYouth.tidy$PctYouth <- as.numeric(raw.PctYouth.tidy$PctYouth) # make PctYouth numeric
# NAs produced correspond to reference countries that we will remove (outside of scope of study)
raw.PctYouth.tidy$year <- as.integer(raw.PctYouth.tidy$year) # make year an integer
raw.PctYouth.tidy$country <- as.factor(raw.PctYouth.tidy$country)
# step 3: remove observations
PctYouth <- raw.PctYouth.tidy[raw.PctYouth.tidy$country == "Austria" | raw.PctYouth.tidy$country == "Belgium" |
raw.PctYouth.tidy$country == "Finland" | raw.PctYouth.tidy$country == "Germany" |
raw.PctYouth.tidy$country == "Sweden" | raw.PctYouth.tidy$country == "Switzerland" |
raw.PctYouth.tidy$country == "Spain" | raw.PctYouth.tidy$country == "Slovenia",]
# subset only the 8 reference countries we're interested in
PctYouth.tidy <- PctYouth[PctYouth$year == 2011 | PctYouth$year == 2012 | PctYouth$year == 2013 | PctYouth$year == 2014 |
PctYouth$year == 2015 | PctYouth$year == 2016 | PctYouth$year == 2017 | PctYouth$year == 2018 |
PctYouth$year == 2019 | PctYouth$year == 2020, ]
# subset only the reference dates we're interested in
# check the (tidied) PctYouth data
head(PctYouth.tidy); str(PctYouth.tidy) ## country year PctYouth
## 15895 Finland 2011 12.3
## 15896 Finland 2012 12.2
## 15897 Finland 2013 12.1
## 15898 Finland 2014 12.0
## 15899 Finland 2015 11.9
## 15900 Finland 2016 11.7
## 'data.frame': 80 obs. of 3 variables:
## $ country : Factor w/ 255 levels "Afghanistan",..: 79 79 79 79 79 79 79 79 79 79 ...
## $ year : int 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 ...
## $ PctYouth: num 12.3 12.2 12.1 12 11.9 11.7 11.5 11.3 11.1 11 ...
Then, we tidied the GDP data. We imported these data as a long-wise table in Section 4.2, and these data already included our three variables of interest: reference country, reference year, and GDP (2010 USD). However, these data also included a number of variables (e.g. ‘Indicator.Name’), reference years, and reference countries that are outside the scope of our study. Additionally, the structure of reference country and reference year in this dataframe are incorrectly formatted as characters.
Thus, we tidied the GDP data in three steps. First, we renamed the GDP columns and created a subset of our variables of interest. Next, we removed the observations that were outside the scope of our study. Finally, we reformatted the variable types.
# step 1: rename cols and subset vars of interest
raw.gdp.temp <- na.omit(raw.gdp) #create dataframe
colnames(raw.gdp.temp) <- c("country", "country.code", "indicator.name", "indicator.code", "year", "GDP")
# update column names
raw.gdp.tidy <- raw.gdp.temp %>% select(country, year, GDP)
# subset country, year, GDP
# step 2: remove observations
gdp <- raw.gdp.tidy[raw.gdp.tidy$country == "Austria" | raw.gdp.tidy$country == "Belgium" |
raw.gdp.tidy$country == "Finland" | raw.gdp.tidy$country == "Germany" |
raw.gdp.tidy$country == "Sweden" | raw.gdp.tidy$country == "Switzerland" |
raw.gdp.tidy$country == "Spain" | raw.gdp.tidy$country == "Slovenia",]
# use logical statements to subset 8 countries from sampling frame
gdp.tidy <- as.data.frame(gdp[gdp$year == "X2011" | gdp$year == "X2012" | gdp$year == "X2013" | gdp$year == "X2014" |
gdp$year == "X2015" | gdp$year == "X2016" |
gdp$year == "X2017" | gdp$year == "X2018" |
gdp$year == "X2019" | gdp$year == "X2020",])
# use logical statements to subset 8 years from sampling frame
gdp.tidy$year <- as.factor(gdp.tidy$year) # transform year to factor to fix formatting of values
levels(gdp.tidy$year) <- c(2011, 2012, 2013, 2014, 2015, 2016, 2017,
2018, 2019, 2020) # redefine values
gdp.tidy$year <- as.integer(gdp.tidy$year) # transform back to integer
gdp.tidy["year"] <- gdp.tidy$year + 2010 # add 2010 to fix misread values
# step 3: reformat vars
gdp.tidy$country <- as.factor(gdp.tidy$country)
# transform country to factor
gdp.tidy$year <- as.integer(gdp.tidy$year)
# transform year to integer
# check the data
head(gdp.tidy); str(gdp.tidy) ## country year GDP
## 684 Austria 2011 4.31120e+11
## 685 Austria 2012 4.09425e+11
## 686 Austria 2013 4.30069e+11
## 687 Austria 2014 4.41996e+11
## 688 Austria 2015 3.81818e+11
## 689 Austria 2016 3.95569e+11
## 'data.frame': 80 obs. of 3 variables:
## $ country: Factor w/ 8 levels "Austria","Belgium",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ year : int 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 ...
## $ GDP : num 4.31e+11 4.09e+11 4.30e+11 4.42e+11 3.82e+11 ...
## - attr(*, "na.action")= 'omit' Named int [1:3464] 1 2 3 4 5 6 7 8 9 10 ...
## ..- attr(*, "names")= chr [1:3464] "1" "2" "3" "4" ...
Finally, the last step of tidying the data was to merge the three variables we tidied – drug usage, PctYouth, and GDP – into a single dataframe. Since each dataframe included information about the reference country and reference year corresponding to each observation, we merged the dataframes by a combination of reference country and reference year (i.e. we will match values of age, GDP, and drug usage from each of the respective dataframes by the variables reference country and reference year).
data.temp <- (left_join(y = PctYouth.tidy, x = drug.tidy, by = c('country', 'year')))
# merge 1: age and drug usage data
data <- (left_join(y = gdp.tidy, x = data.temp, by = c('country', 'year')))
# merge 2: result of merge 1 and GDP data
head(data); str(data) # check the data## country city Weekday.mean year drug PctYouth GDP
## 1 Belgium Antwerp Zuid 239.64 2011 amphetamine 12.1 5.22646e+11
## 2 Belgium Brussels 24.63 2011 amphetamine 12.1 5.22646e+11
## 3 Spain Barcelona 11.31 2011 amphetamine 10.2 1.47877e+12
## 4 Spain Castellon 0.00 2011 amphetamine 10.2 1.47877e+12
## 5 Spain Santiago 30.58 2011 amphetamine 10.2 1.47877e+12
## 6 Sweden Umea 16.57 2011 amphetamine 13.5 5.74094e+11
## 'data.frame': 1401 obs. of 7 variables:
## $ country : Factor w/ 269 levels "Belgium","Czech Republic",..: 1 1 3 3 3 10 1 3 3 3 ...
## $ city : chr "Antwerp Zuid" "Brussels" "Barcelona" "Castellon" ...
## $ Weekday.mean: num 239.6 24.6 11.3 0 30.6 ...
## $ year : int 2011 2011 2011 2011 2011 2011 2011 2011 2011 2011 ...
## $ drug : Factor w/ 5 levels "amphetamine",..: 1 1 1 1 1 1 2 2 2 2 ...
## $ PctYouth : num 12.1 12.1 10.2 10.2 10.2 13.5 12.1 10.2 10.2 10.2 ...
## $ GDP : num 5.23e+11 5.23e+11 1.48e+12 1.48e+12 1.48e+12 ...
In the last section, we tidied our data so that it was properly formatted for analysis. Before we proceed with our analysis, it was important to first look at the distributions of variables in our data and to examine the relationships that existed between our variables. Our analysis is centered around an linear model to predict differences in the weekday mean drug usage estimates between reference countries, given population age and wealth demographics. Accordingly, we must ensure that drug usage, PctYouth, and GDP conform to the assumptions of a linear model. Therefore, the following assumptions must be met:
| No. | Assumption | Section |
|---|---|---|
| 1. | Model is appropriate (i.e. drug usage~PctYouth and drug usage~GDP are approximately linear). | Section 6.1 |
| 2. | Normal distribution of residuals. | Section 6.2 |
| 3. | Homogeneity of variance across values of y hat (i.e. model predicted drug use estimates). | Section 6.3 |
| 4. | No major outliers. | Section 6.4 |
In the following section, each of these assumptions of the model are assessed in individual subsections, starting with assumption no. 1. In the fifth and final subsection (Section 6.5), I will ensure that our data are prepared for analysis.
First, we must ensure that the model fit is appropriate for our data. In other words, we want to ensure that population age (i.e. PctYouth) and wealth (i.e. GDP) demographics are linearly related to weekday mean drug usage. We do this using a pair of scatterplots below, shown for PctYouth first (Figure 6.1a) and GDP second (Figure 6.1b). For each scatterplot, the line shown represents the least-squares mean regression (LSMR) line of best fit for the data, and the grey area around the line of best fit represents the corresponding 95% confidence interval. From the scatterplots and regression lines constructed, it appears that a linear relationship is not appropriate for describing the relationship between drug usage and either PctYouth or GDP.
ggplot(data = data, aes(x = PctYouth, y = Weekday.mean)) +
geom_point() + # scatter
geom_smooth(method = "lm", color = "black") + #LSMR line
labs(title = "Figure 6.1a", # add labels
x = "Population Aged 15-24 (Percent)",
y = "Drug Usage (mg/1000 people/day)") +
theme_classic() +
theme(plot.title = element_text(hjust = 0.5)) + # center title
theme(plot.title = element_text(face = "bold")) + # bold title
theme(text = element_text(size=18))
ggplot(data = data, aes(x = GDP, y = Weekday.mean)) +
geom_point() + # scatter
geom_smooth(method = "lm", color = "black") + # LSMR line
labs(title = "Figure 6.1b", # add labels
x = "GDP (USD)",
y = "Drug Usage (mg/1000 people/day)") +
theme_classic() +
theme(plot.title = element_text(hjust = 0.5)) + # center title
theme(plot.title = element_text(face = "bold")) + # bold title
theme(text = element_text(size=18))However, transforming our data may be sufficient to satisfy the condition of appropriate model fit. To explore our options, we examine the effect of a natural log transformation of drug usage and GDP. Note that for drug usage, a vertical translation (10 units) is applied before the log transformation to avoid dividing by 0. As shown in Section 6.4, we chose to transform these variables specifically because they are right-skewed. Again, a pair of scatterplots is shown below, representing a plot of log-transformed drug usage by PctYouth (Figure 6.1c) and a plot of log-transformed drug usage by log-transformed GDP (Figure 6.1d). Again, the LSMR line of best fit, and the corresponding 95% confidence interval, are shown for each plot. As shown below, linear relationships better represent these data following these transformations. Accordingly, we will continue our analysis with the log-transformed versions of drug usage and GDP (i.e. log(drug use) and log(GDP)).
ggplot(data = data, aes(x = PctYouth, y = log(10 + Weekday.mean))) +
geom_point() + # scatter
geom_smooth(method = "lm", color = "black") + #LSMR line
labs(title = "Figure 6.1c", # add labels
x = "Population Aged 15-24 (Percent)",
y = "Drug Usage (mg/1000 people/day)") +
theme_classic() +
theme(plot.title = element_text(hjust = 0.5)) + # center title
theme(plot.title = element_text(face = "bold")) + # bold title
theme(text = element_text(size=18))
ggplot(data = data, aes(x = log(GDP), y = log(10+ Weekday.mean))) +
geom_point() + # scatter
geom_smooth(method = "lm", color = "black") + # LSMR line
labs(title = "Figure 6.1d", # add labels
x = "GDP (USD)",
y = "Drug Usage (mg/1000 people/day)") +
theme_classic() +
theme(plot.title = element_text(hjust = 0.5)) + # center title
theme(plot.title = element_text(face = "bold")) + # bold title
theme(text = element_text(size=18))
Z <- data$Weekday.mean+10
temp <- log(10 + data$Weekday.mean)
data["log.Weekday.mean"] <- temp
data$log.Weekday.mean[is.na(data$log.Weekday.mean)] <- 0
Y <- data$GDP
temp2 <- log(data$GDP)
data["log.GDP"] <- temp2
data$log.GDP[is.na(data$log.GDP)] <- 0Next, we need to assess the normality of residuals produced by the relationships between log(Drug Usage) and PctYouth (Figure 6.2a), and between log(Drug Usage) and log(GDP) (Figure 6.2b). We can do this using two qqplots, corresponding to each of the explanatory variables. As shown below, these plots suggest that the residuals are approximately normally distributed for both of these relationships. Therefore, these plots suggest that the assumption of normality of residuals is met.
mod1 <- lm(log.Weekday.mean~PctYouth,data=data)
fig6.2a <- autoplot(mod1, smooth.colour=NA) + theme_bw() + theme(panel.border = element_blank(), panel.grid.major = element_blank(),
panel.grid.minor = element_blank(), axis.line = element_line(colour = "black"))
fig6.2a@plots[[2]] + ggtitle("Figure 6.2a") + theme(plot.title = element_text(hjust = 0.5)) + # center title
theme(plot.title = element_text(face = "bold")) + # bold title
theme(text = element_text(size=18))
mod2 <- lm(log.Weekday.mean~log.GDP,data=data)
fig6.2b <- autoplot(mod2, smooth.colour=NA) + theme_bw() + theme(panel.border = element_blank(), panel.grid.major = element_blank(),
panel.grid.minor = element_blank(), axis.line = element_line(colour = "black"))
fig6.2b@plots[[2]] + ggtitle("Figure 6.2b") + theme(plot.title = element_text(hjust = 0.5)) + # center title
theme(plot.title = element_text(face = "bold")) + # bold title
theme(text = element_text(size=18))Next, we must asses the homogeneity of variance across the model predicted drug use estimates, according to both PctYouth (Figure 6.3 a) and log(GDP) (Figure 6.3b). To do this, we assess the scale-location plots corresponding to each of these relationships. As shown below, these plots suggest that both relationships approximately conform to the homogeneity of variance.
fig6.3a <- autoplot(mod1, smooth.colour=NA) + theme_bw() + theme(panel.border = element_blank(), panel.grid.major = element_blank(),
panel.grid.minor = element_blank(), axis.line = element_line(colour = "black"))
fig6.3a@plots[[3]] + ggtitle("Figure 6.3a") + theme(plot.title = element_text(hjust = 0.5)) + # center title
theme(plot.title = element_text(face = "bold")) + # bold title
theme(text = element_text(size=18))
fig6.3b <- autoplot(mod2, smooth.colour=NA) + theme_bw() + theme(panel.border = element_blank(), panel.grid.major = element_blank(),
panel.grid.minor = element_blank(), axis.line = element_line(colour = "black"))
fig6.3b@plots[[3]] + ggtitle("Figure 6.3b") + theme(plot.title = element_text(hjust = 0.5)) + # center title
theme(plot.title = element_text(face = "bold")) + # bold title
theme(text = element_text(size=18))To sastisfy the final condition of a linear model, we must assess any potential outliers in the data. To do this, we consider trios of boxplots for each of log(drug use), Pct Youth, and log(GDP). For each variable, one boxplot will represent the distribution of the data, one boxplot will represent the distribution of the data by reference country, and one boxplot will represent the distribution of the data by reference drug.
First, we examine the distributions of log(drug use). log(drug use) is presented in Figure 6.4a, log(drug use) ~ reference country is presented in Figure 6.4b, and log(drug use) ~ reference drug is presented in Figure 6.4c. As shown in Figure 6.4a, there are no potential outliers identified in the boxplot of log(drug use). Although several potential outliers are highlighted in the below plots with open circles, these points are relatively few in number and small in magnitude when compared to distributions of the drug use data before the log-transformation (Appendix E). Thus, these plots suggest the assumption is met for the log(drug use) data.
boxplot(data$log.Weekday.mean, col = NULL, main = "Figure 6.1d",
ylab = "Drug Usage (log(mg/1000 people/day))", range = 1.5) bwplot(data$log.Weekday.mean~data$country, strip = strip.custom(bg = 'white'),
cex = 0.5, cex.lab=1.5, cex.axis=1.5, cex.main=1.5, layout = c(1, 1),
main = "Figure 6.1e",
xlab = "Reference Country", ylab = "Drug Usage (log(mg/1000 people/day))",
par.settings = list(
box.rectangle = list(col = 1),
box.umbrella = list(col = 1),
plot.symbol = list(col=1),
plot.symbol = list(cex = .5)),
scales = list(x = list(relation = "same"),
y = list(relation = "same")))
bwplot(data$log.Weekday.mean~data$drug, strip = strip.custom(bg = 'white'),
cex = .5, layout = c(1, 1),
main = "Figure 6.1f",
xlab = "Reference Drug", ylab = "Drug Usage (log(mg/1000 people/day))",
par.settings = list(
box.rectangle = list(col = 1),
box.umbrella = list(col = 1),
plot.symbol = list(col=1),
plot.symbol = list(cex = .5)),
scales = list(x = list(relation = "same"),
y = list(relation = "same"))) Next, we examine the distributions of PctYouth. PctYouth is presented in Figure 6.4d, PctYouth ~ reference country is presented in Figure 6.4e, and PctYouth ~ reference drug is presented in Figure 6.4f. As shown in Figure 6.4d, there is one potential outliers identified in the boxplot of PctYouth. Again, these potential outliers are highlighted with open circles, and these points are relatively few in number and small in magnitude. Thus, these plots suggest the assumption is met for the PctYouth data.
boxplot(data$PctYouth, col = NULL, main = "Figure 6.2a",
ylab = "Population Aged 15-24 (Percent)", range = 1.5) bwplot(data$PctYouth~data$country, strip = strip.custom(bg = 'white'),
cex = 0.5, cex.lab=1.5, cex.axis=1.5, cex.main=1.5, layout = c(1, 1),
main = "Figure 6.2b",
xlab = "Reference Country", ylab = "Population Aged 15-24 (Percent)",
par.settings = list(
box.rectangle = list(col = 1),
box.umbrella = list(col = 1),
plot.symbol = list(col=1),
plot.symbol = list(cex = .5)),
scales = list(x = list(relation = "same"),
y = list(relation = "same")))
bwplot(data$PctYouth~data$drug, strip = strip.custom(bg = 'white'),
cex = .5, layout = c(1, 1),
main = "Figure 6.2c",
xlab = "Reference Drug", ylab = "Population Aged 15-24 (Percent)",
par.settings = list(
box.rectangle = list(col = 1),
box.umbrella = list(col = 1),
plot.symbol = list(col=1),
plot.symbol = list(cex = .5)),
scales = list(x = list(relation = "same"),
y = list(relation = "same"))) Finally, we assess the normality of log(GDP). log(GDP) is presented in Figure 6.4g, log(GDP) ~ reference country is presented in Figure 6.4h, and log(GDP) ~ reference drug is presented in Figure 6.4i. As shown in Figure 6.4g, there are no potential outliers identified in the boxplot of PctYouth. Potential outliers shown in Figure 6.4h and Figure 6.4i, highlighted with open circles, are relatively few in number and small in magnitude. Thus, these plots suggest the assumption is met for the log(GDP) data.
boxplot(data$log.GDP, col = NULL, main = "Figure 6.3d",
ylab = "GDP (log(USD))", range = 1.5) bwplot(data$log.GDP~data$country, strip = strip.custom(bg = 'white'),
cex = 0.5, cex.lab=1.5, cex.axis=1.5, cex.main=1.5, layout = c(1, 1),
main = "Figure 6.3e",
xlab = "Reference Country", ylab = "GDP (log(USD))",
par.settings = list(
box.rectangle = list(col = 1),
box.umbrella = list(col = 1),
plot.symbol = list(col=1),
plot.symbol = list(cex = .5)),
scales = list(x = list(relation = "same"),
y = list(relation = "same")))
bwplot(data$log.GDP~data$drug, strip = strip.custom(bg = 'white'),
cex = .5, layout = c(1, 1),
main = "Figure 6.3f",
xlab = "Reference Drug", ylab = "GDP (log(USD))",
par.settings = list(
box.rectangle = list(col = 1),
box.umbrella = list(col = 1),
plot.symbol = list(col=1),
plot.symbol = list(cex = .5)),
scales = list(x = list(relation = "same"),
y = list(relation = "same"))) Thus, the above subsections suggest that the assumptions are met to construct a linear model of log(drug usage) by PctYouth and log(GDP). In the following subsection, we finalize the data for analysis.
data <- data %>% select(-Weekday.mean, -GDP) # remove non-transformed variables
head(data); str(data) # check data## country city year drug PctYouth log.Weekday.mean log.GDP
## 1 Belgium Antwerp Zuid 2011 amphetamine 12.1 5.520020 26.98217
## 2 Belgium Brussels 2011 amphetamine 12.1 3.544720 26.98217
## 3 Spain Barcelona 2011 amphetamine 10.2 3.059176 28.02223
## 4 Spain Castellon 2011 amphetamine 10.2 2.302585 28.02223
## 5 Spain Santiago 2011 amphetamine 10.2 3.703275 28.02223
## 6 Sweden Umea 2011 amphetamine 13.5 3.279783 27.07606
## 'data.frame': 1401 obs. of 7 variables:
## $ country : Factor w/ 269 levels "Belgium","Czech Republic",..: 1 1 3 3 3 10 1 3 3 3 ...
## $ city : chr "Antwerp Zuid" "Brussels" "Barcelona" "Castellon" ...
## $ year : int 2011 2011 2011 2011 2011 2011 2011 2011 2011 2011 ...
## $ drug : Factor w/ 5 levels "amphetamine",..: 1 1 1 1 1 1 2 2 2 2 ...
## $ PctYouth : num 12.1 12.1 10.2 10.2 10.2 13.5 12.1 10.2 10.2 10.2 ...
## $ log.Weekday.mean: num 5.52 3.54 3.06 2.3 3.7 ...
## $ log.GDP : num 27 27 28 28 28 ...
In the following section, we proceeded with our analyses to test our hypothesis that the effect of country on drug usage is a function of age and GDP. We do so by following the Plot -> Model -> Check Assumptions -> Interpret -> Plot Again workflow outlined in Beckerman et al. (2017).
First, we look at cocaine usage.
To perform this analysis, we created a subset of only the cocaine data in a new dataframe called cocaineplot.
#Create new dataframe containing only data pertaining to cocaine
cocaineplot <- data %>%
filter(drug == "cocaine") %>%
select(country, PctYouth, log.Weekday.mean, log.GDP)
#check data
glimpse(cocaineplot)## Rows: 331
## Columns: 4
## $ country <fct> "Belgium", "Belgium", "Spain", "Spain", "Spain", "Swe…
## $ PctYouth <dbl> 12.1, 12.1, 10.2, 10.2, 10.2, 13.5, 12.0, 12.0, 12.0,…
## $ log.Weekday.mean <dbl> 6.430622, 5.076111, 5.834226, 5.936850, 5.074048, 2.4…
## $ log.GDP <dbl> 26.98217, 26.98217, 28.02223, 28.02223, 28.02223, 27.…
Here are the summary statistics (mean, sample size, and standard error by country) for the cocaine data.
#Create dataframe with mean cocaine levels, sample size, and standard error by country
plot2 <- cocaineplot %>%
group_by(country) %>%
summarise(
mean = mean(log.Weekday.mean, na.rm = TRUE),
samplesize = n(),
SE = std.error(log.Weekday.mean)
)
plot2## # A tibble: 8 × 4
## country mean samplesize SE
## <fct> <dbl> <int> <dbl>
## 1 Belgium 5.44 47 0.141
## 2 Spain 4.52 66 0.190
## 3 Sweden 3.61 11 0.298
## 4 Switzerland 6.01 49 0.0681
## 5 Finland 2.77 66 0.0744
## 6 Germany 4.55 58 0.134
## 7 Austria 4.77 24 0.152
## 8 Slovenia 5.40 10 0.189
Here, the log(drug), PctYouth, and log(GDP) data associated with cocaine are presented as simple point plots.
For the first plot of reference country vs cocaine consumption, Finland appears to consume the least amount of cocaine while Belgium appears to consume the most. The PctYouth vs cocaine consumption plot appears to show a negative relationship between PctYouth and cocaine consumption. The log(GDP) vs cocaine plot does not appear to show any relationship.
#Plot our cocaine data by country
ggplot(data = cocaineplot, aes(x = country, y = log.Weekday.mean)) +
geom_point(size = 4) +
xlab("Country") +
ylab("Cocaine Consumed (log(mg/1000 people/day))") +
ggtitle("Cocaine Consumed by European Country") +
theme_bw()#Plot our cocaine data by age
ggplot(data = cocaineplot, aes(x = PctYouth, y = log.Weekday.mean)) +
geom_point() +
xlab("Percent Youth (Ages 15-24)") +
ylab("Cocaine Consumed (log(mg/1000 people/day))") +
ggtitle("Cocaine Consumed vs Percent Youth") +
theme_classic()#Plot our cocaine data by GDP
ggplot(data = cocaineplot, aes(x = log.GDP, y = log.Weekday.mean)) +
geom_point() +
xlab("GDP (log(USD))") +
ylab("Cocaine Consumed (log(mg/1000 people/day))") +
ggtitle("Cocaine Consumed vs GDP") +
theme_classic()Here we fit a linear model, where we hypothesize that the effect of reference country on cocaine usage is a function of PctYouth and log(GDP), using the variables from the cocaineplot data frame. To address our hypothesis, we are particularly interested the interaction effects reference country : PctYouth and reference country : log(GDP).
#Create Model
cocmod <- lm(data = cocaineplot, log.Weekday.mean ~ country + PctYouth + log.GDP + country:PctYouth + country:log.GDP)Before we can run the linear model with cocaine data, we first had to verify that the assumptions of this analysis were met. In the residuals vs. fitted plot (top left), there does not appear to be any major hump-shapes or valleys, suggesting the linear model may be a good fit. In the Normal QQ plot (top right), most of the residuals fall on the line suggesting that these data are approximately normal. The scale-location plot (bottom left) does not show any obvious pattern, suggesting that our data have approximately equal variance. Finally, the Constant Leverage plot (bottom right) shows no obvious influential data points. Thus, we were ready to continue with the linear model for cocaine.
#Check assumptions
autoplot(cocmod, smooth.colour = NA)## Warning: Removed 331 row(s) containing missing values (geom_path).
## Warning: Removed 331 row(s) containing missing values (geom_path).
## Warning: Removed 1 rows containing missing values (geom_point).
Here, we produce an ANOVA table and summary table to help interpret our model.
First looking at the ANOVA table produced below, most of the variation in cocaine usage is explained by reference country (Mean sq value of 52.613), and some variation is also explained by PctYouth (Mean sq value of 7.349), log(GDP) (Mean sq value of 8.156), the interaction between country and PctYouth (Mean sq value of 3.970), and the interaction between country and log(GDP) (Mean sq value of 1.849).
Results from the ANOVA table suggest that there is a significant difference in cocaine consumption among the European countries examined in this study, with Switzerland consuming the most cocaine and Finland consuming the least (F=106.7, df=7, p<0.05). PctYouth also had a significant effect on cocaine consumption, with a lower proportion of youth in the population being associated with higher amounts of cocaine usage (F=11.3, df=1, p<0.05). Since there were no significant interaction effects between reference country : PctYouth or reference country : log(GDP), our hypothesis that the effect of reference country on cocaine usage is a function of PctYouth and GDP was not supported.
#Generate ANOVA table and summary
anova(cocmod)## Analysis of Variance Table
##
## Response: log.Weekday.mean
## Df Sum Sq Mean Sq F value Pr(>F)
## country 7 368.29 52.613 61.9014 < 2.2e-16 ***
## PctYouth 1 7.35 7.349 8.6470 0.003525 **
## log.GDP 1 8.16 8.156 9.5961 0.002130 **
## country:PctYouth 7 27.79 3.970 4.6705 5.433e-05 ***
## country:log.GDP 7 12.94 1.849 2.1750 0.036263 *
## Residuals 307 260.93 0.850
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(cocmod)##
## Call:
## lm(formula = log.Weekday.mean ~ country + PctYouth + log.GDP +
## country:PctYouth + country:log.GDP, data = cocaineplot)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.00355 -0.49919 0.00044 0.52605 2.77911
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -17.54079 68.31162 -0.257 0.797524
## countrySpain -261.28924 92.04349 -2.839 0.004831 **
## countrySweden 302.93586 268.21411 1.129 0.259588
## countrySwitzerland -23.44064 176.64377 -0.133 0.894518
## countryFinland 29.02660 86.45787 0.336 0.737303
## countryGermany 31.95908 103.09092 0.310 0.756765
## countryAustria 37.55696 157.32096 0.239 0.811476
## countrySlovenia 471.21647 381.57122 1.235 0.217798
## PctYouth -0.76215 0.49820 -1.530 0.127097
## log.GDP 1.18184 2.49932 0.473 0.636645
## countrySpain:PctYouth 3.54039 0.89985 3.934 0.000103 ***
## countrySweden:PctYouth 0.50043 0.59868 0.836 0.403869
## countrySwitzerland:PctYouth 0.22038 0.61075 0.361 0.718477
## countryFinland:PctYouth -0.08913 0.62017 -0.144 0.885820
## countryGermany:PctYouth -0.67960 1.25419 -0.542 0.588304
## countryAustria:PctYouth 0.65721 0.97971 0.671 0.502838
## countrySlovenia:PctYouth -5.05151 6.61438 -0.764 0.445622
## countrySpain:log.GDP 8.00772 3.38388 2.366 0.018581 *
## countrySweden:log.GDP -11.48903 10.01414 -1.147 0.252159
## countrySwitzerland:log.GDP 0.76289 6.39506 0.119 0.905121
## countryFinland:log.GDP -1.13794 3.17493 -0.358 0.720280
## countryGermany:log.GDP -0.99928 3.55957 -0.281 0.779107
## countryAustria:log.GDP -1.70790 5.69929 -0.300 0.764633
## countrySlovenia:log.GDP -17.17425 13.50627 -1.272 0.204486
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9219 on 307 degrees of freedom
## Multiple R-squared: 0.6193, Adjusted R-squared: 0.5908
## F-statistic: 21.72 on 23 and 307 DF, p-value: < 2.2e-16
Here we replot our cocaine data using predicted values from our linear model
# Plot by country first
# Make new x values with unique country values, and mean age and GDP (standardizes it)
new.x1 <- expand.grid(
country = unique(cocaineplot$country),
PctYouth = mean(cocaineplot$PctYouth),
log.GDP = mean(cocaineplot$log.GDP))
#New y values predicted from the model
new.y1 <- predict(cocmod, newdata = new.x1, interval = "confidence")
#Add the new x and y to a data frame called addThese1
addThese1 <- data.frame(new.x1, new.y1)
#Plot country by cocaine consumption using predicted values
ggplot(data = addThese1, aes(x = country, y = fit)) +
geom_point(size = 4) +
geom_errorbar(data = addThese1, aes(ymin = lwr, ymax = upr), size = 0.4, stat = "identity") +
xlab("Country") +
ylab("Mean Log Cocaine Consumed (log(mg/1000 people/day))") +
ggtitle("Mean Cocaine Consumed by European Country") +
theme_bw()#Now do the same with age
#Make new x values
new.x2 <- expand.grid(
country = unique(cocaineplot$country),
PctYouth = seq(from=min(cocaineplot$PctYouth), to=max(cocaineplot$PctYouth) +1),
log.GDP = mean(cocaineplot$log.GDP))
#New y values predicted from the model
new.y2 <- predict(cocmod, newdata = new.x2, interval = "confidence")
#Add the new x and y to a data frame called addThese2
addThese2 <- data.frame(new.x2, new.y2)
#Plot age vs cocaine consumption including countries
ggplot(data = addThese2, aes(x = PctYouth, y = fit, group = country, color = country)) +
geom_point() +
geom_smooth(method="lm") +
xlab("Percent Youth (Ages 15-24)") +
ylab("Mean Cocaine Consumed (log mg/1000 people/day)") +
ggtitle("Mean Cocaine Consumed vs Percent Youth") +
theme_classic()## `geom_smooth()` using formula 'y ~ x'
Next, we looked at amphetamine usage.
To perform this analysis, we created a subset of only the amphetamine data in a new dataframe called amphetplot.
#Create new dataframe containing only data pertaining to amphetamine
amphetplot <- data %>%
filter(drug == "amphetamine") %>%
select(country, PctYouth, log.Weekday.mean, log.GDP)
#check data
glimpse(amphetplot)## Rows: 327
## Columns: 4
## $ country <fct> "Belgium", "Belgium", "Spain", "Spain", "Spain", "Swe…
## $ PctYouth <dbl> 12.1, 12.1, 10.2, 10.2, 10.2, 13.5, 12.0, 12.0, 12.0,…
## $ log.Weekday.mean <dbl> 5.520020, 3.544720, 3.059176, 2.302585, 3.703275, 3.2…
## $ log.GDP <dbl> 26.98217, 26.98217, 28.02223, 28.02223, 28.02223, 27.…
Here are the summary statistics (mean, sample size, and standard error by country) for the amphetamine data.
#Create dataframe with mean amphetamine levels, sample size, and standard error by country
plot3 <- amphetplot %>%
group_by(country) %>%
summarise(
mean = mean(log.Weekday.mean, na.rm = TRUE),
samplesize = n(),
SE = std.error(log.Weekday.mean)
)
plot3## # A tibble: 8 × 4
## country mean samplesize SE
## <fct> <dbl> <int> <dbl>
## 1 Belgium 4.68 44 0.125
## 2 Spain 3.65 63 0.140
## 3 Sweden 5.16 13 0.327
## 4 Switzerland 3.43 43 0.0939
## 5 Finland 4.17 66 0.0692
## 6 Germany 4.20 58 0.125
## 7 Austria 3.10 24 0.0683
## 8 Slovenia 3.12 16 0.177
Here, the log(drug usage), PctYouth, and log(GDP) data associated with amphetamine are presented as simple point plots.
For the first plot of reference country vs amphetamine consumption, Austria appears to consume the least amount of amphetamine, while Belgium and Germany appear to consume the most. The PctYouth vs amphetamine consumption plot appears to show a positive relationship between PctYouth and amphetamine consumption. The log(GDP) vs amphetamine plot does not appear to show any relationship.
#Plot amphetamine data by country
ggplot(data = amphetplot, aes(x = country, y = log.Weekday.mean)) +
geom_point(size = 4) +
xlab("Country") +
ylab("Amphetamine Consumed (log (mg/1000 people/day))") +
ggtitle("Amphetamine Consumed by European Country") +
theme_bw()#Plot amphetamine data by age
ggplot(data = amphetplot, aes(x = PctYouth, y = log.Weekday.mean)) +
geom_point() +
xlab("Percent Youth (Ages 15-24)") +
ylab("Amphetamine Consumed (log(mg/1000 people/day))") +
ggtitle("Amphetamine Consumed vs Percent Youth") +
theme_classic()#Plot amphetamine data by GDP
ggplot(data = amphetplot, aes(x = log.GDP, y = log.Weekday.mean)) +
geom_point() +
xlab("GDP(log(USD))") +
ylab("Amphetamine Consumed (log(mg/1000 people/day))") +
ggtitle("Amphetamine Consumed vs GDP") +
theme_classic()Here we fit a linear model, where we hypothesize that the effect of country on amphetamine usage is a function of PctYouth and log(GDP), using the variables from the amphetplot data frame. To address our hypothesis, we are particularly interested the interaction effects between reference country : Pct Youth and reference country : log(GDP).
#Create Model
ampmod <- lm(data = amphetplot, log.Weekday.mean ~ country + PctYouth + log.GDP + country:PctYouth + country:log.GDP)Before we can run the linear model with amphetamine data, we first had to verify that the assumptions of this analysis were met. In the residuals vs. fitted plot (top left), there does not appear to be any major hump-shapes or valleys, suggesting the linear model may be a good fit. In the Normal QQ plot (top right), most of the residuals fall on the line suggesting that these data are approximately normal. The scale-location plot (bottom left) does not show any obvious pattern, suggesting that our data have approximately equal variance. Finally, the Constant Leverage plot (bottom right) shows no obvious influential data points. Thus, we were ready to continue with the linear model for amphetamine.
#Check assumptions
autoplot(ampmod, smooth.colour = NA)## Warning: Removed 327 row(s) containing missing values (geom_path).
## Warning: Removed 327 row(s) containing missing values (geom_path).
Here, we produce an ANOVA table and summary table to help interpret our model.
First looking at the ANOVA table produced below, most of the variation in amphetamine usage is explained by reference country (Mean sq value of 10.98), and some variation is also explained by PctYouth (Mean sq value of 2.54). log(GDP) and all variable interactions did not capture any significant amount of variation. With this in mind, we will simplify our model to exclude log(GDP) and the interaction terms in the following subsection.
#Generate ANOVA table and summary
anova(ampmod)## Analysis of Variance Table
##
## Response: log.Weekday.mean
## Df Sum Sq Mean Sq F value Pr(>F)
## country 7 95.004 13.5720 21.9771 < 2.2e-16 ***
## PctYouth 1 21.273 21.2731 34.4476 1.148e-08 ***
## log.GDP 1 0.964 0.9643 1.5615 0.2124
## country:PctYouth 7 7.311 1.0444 1.6912 0.1106
## country:log.GDP 7 3.730 0.5328 0.8627 0.5364
## Residuals 303 187.118 0.6176
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(ampmod)##
## Call:
## lm(formula = log.Weekday.mean ~ country + PctYouth + log.GDP +
## country:PctYouth + country:log.GDP, data = amphetplot)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.1165 -0.5335 0.1072 0.4922 1.8601
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 30.4618 58.2639 0.523 0.6015
## countrySpain 82.8702 78.5346 1.055 0.2922
## countrySweden -339.8859 197.4717 -1.721 0.0862 .
## countrySwitzerland 48.2286 153.9444 0.313 0.7543
## countryFinland -30.8362 73.2539 -0.421 0.6741
## countryGermany -38.4430 87.8976 -0.437 0.6622
## countryAustria -26.3494 134.1149 -0.196 0.8444
## countrySlovenia 127.5023 290.3433 0.439 0.6609
## PctYouth -0.7935 0.4503 -1.762 0.0791 .
## log.GDP -0.6145 2.1349 -0.288 0.7737
## countrySpain:PctYouth -1.7674 0.8356 -2.115 0.0352 *
## countrySweden:PctYouth -0.3437 0.5284 -0.651 0.5159
## countrySwitzerland:PctYouth 0.1084 0.5575 0.194 0.8460
## countryFinland:PctYouth 0.3875 0.5502 0.704 0.4818
## countryGermany:PctYouth -0.3738 1.0795 -0.346 0.7294
## countryAustria:PctYouth 0.1103 0.8484 0.130 0.8966
## countrySlovenia:PctYouth 0.8678 4.6630 0.186 0.8525
## countrySpain:log.GDP -2.4279 2.8949 -0.839 0.4023
## countrySweden:log.GDP 12.7411 7.3737 1.728 0.0850 .
## countrySwitzerland:log.GDP -1.8619 5.5847 -0.333 0.7391
## countryFinland:log.GDP 0.9663 2.6947 0.359 0.7202
## countryGermany:log.GDP 1.4590 3.0373 0.480 0.6313
## countryAustria:log.GDP 0.8552 4.8600 0.176 0.8604
## countrySlovenia:log.GDP -5.6808 10.4862 -0.542 0.5884
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.7858 on 303 degrees of freedom
## Multiple R-squared: 0.4067, Adjusted R-squared: 0.3617
## F-statistic: 9.032 on 23 and 303 DF, p-value: < 2.2e-16
Here we simplify our model to exclude GDP and the variable interactions.
Results from the ANOVA table produced below suggest that there is a significant difference in amphetamine consumption among the European countries examined in this study, with Belgium consuming the most amphetamine and Spain consuming the least (F=23.2, df=7, p<0.05). Age also had a significant effect on amphetamine consumption, with a lower proportion of youth in the population being associated with higher amounts of amphetamine usage (F=5.36, df=1, p<0.05). Since there were no significant interaction effects between reference country : PctYouth or reference country : log(GDP), our hypothesis that the effect of reference country on amphetamine usage is a function of PctYouth and GDP was not supported.
#Recreate model without GDP and interaction terms
ampmod2 <- lm(data = amphetplot, log.Weekday.mean ~ country + PctYouth)
#Generate ANOVA table
anova(ampmod2)## Analysis of Variance Table
##
## Response: log.Weekday.mean
## Df Sum Sq Mean Sq F value Pr(>F)
## country 7 95.004 13.5720 21.674 < 2.2e-16 ***
## PctYouth 1 21.273 21.2731 33.973 1.371e-08 ***
## Residuals 318 199.122 0.6262
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#Generate summary
summary(ampmod2)##
## Call:
## lm(formula = log.Weekday.mean ~ country + PctYouth, data = amphetplot)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.33266 -0.56376 0.08089 0.50235 2.07742
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 13.8310 1.5747 8.783 < 2e-16 ***
## countrySpain -2.5576 0.3044 -8.403 1.47e-15 ***
## countrySweden 0.4142 0.2501 1.656 0.098699 .
## countrySwitzerland -1.6067 0.1807 -8.891 < 2e-16 ***
## countryFinland -0.5321 0.1541 -3.454 0.000627 ***
## countryGermany -1.3542 0.2178 -6.216 1.60e-09 ***
## countryAustria -2.1264 0.2215 -9.602 < 2e-16 ***
## countrySlovenia -3.5184 0.4080 -8.623 3.15e-16 ***
## PctYouth -0.7876 0.1351 -5.829 1.37e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.7913 on 318 degrees of freedom
## Multiple R-squared: 0.3687, Adjusted R-squared: 0.3528
## F-statistic: 23.21 on 8 and 318 DF, p-value: < 2.2e-16
Here we replot our amphetamine data using predicted values from our linear model
# Plot by country first
# Make new x values with unique country values, and mean age and GDP (standardizes it)
new.x3 <- expand.grid(
country = unique(amphetplot$country),
PctYouth = mean(amphetplot$PctYouth),
log.GDP = mean(amphetplot$log.GDP))
#New y values predicted from the model
new.y3 <- predict(ampmod2, newdata = new.x3, interval = "confidence")
#Add the new x and y to a data frame called addThese3
addThese3 <- data.frame(new.x3, new.y3)
#Plot amphetamine consumption by country using predicted values
ggplot(data = addThese3, aes(x = country, y = fit)) +
geom_point(size = 4) +
geom_errorbar(data = addThese3, aes(ymin = lwr, ymax = upr), size = 0.4, stat = "identity") +
xlab("Country") +
ylab("Mean Amphetamine Consumed (log(mg/1000 people/day))") +
ggtitle("Mean Amphetamine Consumed by European Country") +
theme_bw()#Now do the same with age
#Make new x values
new.x4 <- expand.grid(
country = unique(amphetplot$country),
PctYouth = seq(from=min(amphetplot$PctYouth), to=max(amphetplot$PctYouth) +1),
log.GDP = mean(amphetplot$log.GDP))
#New y values predicted from the model
new.y4 <- predict(ampmod2, newdata = new.x4, interval = "confidence")
#Add the new x and y to a data frame called addThese2
addThese4 <- data.frame(new.x4, new.y4)
#Plot age vs cocaine consumption including countries
ggplot(data = addThese4, aes(x = PctYouth, y = fit, group = country, color = country)) +
geom_point() +
geom_smooth(method="lm") +
xlab("Percent Youth (Ages 15-24)") +
ylab("Mean Amphetamine Consumed (log(mg/1000 people/day))") +
ggtitle("Mean Amphetamine Consumed vs Percent Youth") +
theme_classic()## `geom_smooth()` using formula 'y ~ x'
Next, we looked at methamphetamine usage.
To perform this analysis, we created a subset of only the methamphetamine usage data in a new dataframe called methplot.
#Create new dataframe containing only data pertaining to methamphetamine
methplot <- data %>%
filter(drug == "methamphetamine") %>%
select(country, PctYouth, log.Weekday.mean, log.GDP)
#Check data
glimpse(methplot)## Rows: 328
## Columns: 4
## $ country <fct> "Belgium", "Belgium", "Spain", "Spain", "Spain", "Swe…
## $ PctYouth <dbl> 12.1, 12.1, 10.2, 10.2, 10.2, 13.5, 12.0, 12.0, 12.0,…
## $ log.Weekday.mean <dbl> 2.691243, 2.302585, 2.853016, 2.302585, 2.302585, 2.5…
## $ log.GDP <dbl> 26.98217, 26.98217, 28.02223, 28.02223, 28.02223, 27.…
Here are the summary statistics (mean, sample size, and standard error by country) for the methamphetamine data.
#Create table with mean methamphetamine levels, sample size, and standard error by country
plot4 <- methplot %>%
group_by(country) %>%
summarise(
mean = mean(log.Weekday.mean, na.rm = TRUE),
samplesize = n(),
SE = std.error(log.Weekday.mean)
)
plot4## # A tibble: 8 × 4
## country mean samplesize SE
## <fct> <dbl> <int> <dbl>
## 1 Belgium 2.65 47 0.0611
## 2 Spain 2.64 66 0.0622
## 3 Sweden 2.63 15 0.133
## 4 Switzerland 3.09 43 0.0946
## 5 Finland 2.97 64 0.0692
## 6 Germany 3.49 58 0.172
## 7 Austria 2.51 24 0.0308
## 8 Slovenia 2.56 11 0.150
Here, the log(drug usage), PctYouth, and log(GDP) data associated with methamphetamine are presented as simple point plots.
For the first plot of reference country vs methamphetamine consumption, Austria appears to consume the least amount of methamphetamine, while Germany appears to consume the most. The PctYouth vs methamphetamine consumption plot does not appear to show any relationship. The log(GDP) vs methamphetamine plot looks like there may be a positive relationship between log.(GDP) and methamphetamine consumption.
#Plot methamphetamine data by country
ggplot(data = methplot, aes(x = country, y = log.Weekday.mean)) +
geom_point(size = 4) +
xlab("Country") +
ylab("Methamphetamine Consumed (log(mg/1000 people/day))") +
ggtitle("Methamphetamine Consumed by European Country") +
theme_bw()#Plot methamphetamine data by age
ggplot(data = methplot, aes(x = PctYouth, y = log.Weekday.mean)) +
geom_point() +
xlab("Percent Youth (Ages 15-24)") +
ylab("Methamphetamine Consumed (log(mg/1000 people/day))") +
ggtitle("Methamphetamine Consumed vs Percent Youth") +
theme_classic()#Plot methamphetamine data by GDP
ggplot(data = methplot, aes(x = log.GDP, y = log.Weekday.mean)) +
geom_point() +
xlab("GDP (log(USD))") +
ylab("Methamphetamine Consumed (log(mg/1000 people/day))") +
ggtitle("Methamphetamine Consumed vs log GDP") +
theme_classic()Here we fit a linear model, where we hypothesize that the effect of country on methamphetamine usage is a function of PctYouth and log(GDP), using the variables from the methplot data frame. To address our hypothesis, we are particularly interested the interaction effects between reference country : PctYouth and reference country : log(GDP).
#Create Model
methmod <- lm(data = methplot, log.Weekday.mean ~ country + PctYouth + log.GDP + country:PctYouth + country:log.GDP)Before we can run the linear model with methamphetamine data, we first had to verify that the assumptions of this analysis were met. In the residuals vs. fitted plot (top left), there does not appear to be any major hump-shapes or valleys, suggesting the linear model may be a good fit. The Normal QQ plot (top right) isn’t perfect, the positive residuals seem to be larger than expected, however, most of the other residuals fall on the line suggesting that these data are approximately normal. The scale-location plot (bottom left) does not show any obvious pattern, suggesting that our data have approximately equal variance. Finally, the Constant Leverage plot (bottom right) shows no obvious influential data points. Thus, we were ready to continue with the linear model for methamphetamine.
#Check assumptions
autoplot(methmod, smooth.colour = NA)## Warning: Removed 328 row(s) containing missing values (geom_path).
## Warning: Removed 328 row(s) containing missing values (geom_path).
Here, we produce an ANOVA table and summary table to help interpret our model.
First looking at the ANOVA table produced below, most of the variation in methamphetamine usage is explained by reference country (Mean sq value of 2.40). While log(GDP), PctYouth, and all variable interactions did not capture any significant amount of variation. With this in mind, we will simplify our model to exclude log(GDP), PctYouth and the interaction terms in the following section.
#Generate ANOVA table and summary
anova(methmod)## Analysis of Variance Table
##
## Response: log.Weekday.mean
## Df Sum Sq Mean Sq F value Pr(>F)
## country 7 35.208 5.0298 10.1483 2.027e-11 ***
## PctYouth 1 1.346 1.3458 2.7153 0.10043
## log.GDP 1 0.050 0.0500 0.1010 0.75088
## country:PctYouth 7 7.250 1.0357 2.0897 0.04446 *
## country:log.GDP 7 4.839 0.6913 1.3948 0.20690
## Residuals 304 150.671 0.4956
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(methmod)##
## Call:
## lm(formula = log.Weekday.mean ~ country + PctYouth + log.GDP +
## country:PctYouth + country:log.GDP, data = methplot)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.5017 -0.3675 -0.1477 0.2276 2.3204
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -8.86653 52.16471 -0.170 0.8651
## countrySpain -23.93406 70.28705 -0.341 0.7337
## countrySweden 374.61026 199.45175 1.878 0.0613 .
## countrySwitzerland 32.45723 137.16714 0.237 0.8131
## countryFinland 94.93113 66.12107 1.436 0.1521
## countryGermany -30.42431 78.72319 -0.386 0.6994
## countryAustria 42.66473 120.13481 0.355 0.7227
## countrySlovenia 148.62411 281.76235 0.527 0.5982
## PctYouth -0.41878 0.38044 -1.101 0.2719
## log.GDP 0.60808 1.90855 0.319 0.7502
## countrySpain:PctYouth 0.10361 0.68715 0.151 0.8803
## countrySweden:PctYouth 0.82704 0.43764 1.890 0.0597 .
## countrySwitzerland:PctYouth -0.22490 0.47120 -0.477 0.6335
## countryFinland:PctYouth 0.07291 0.48399 0.151 0.8804
## countryGermany:PctYouth -1.68455 0.95773 -1.759 0.0796 .
## countryAustria:PctYouth 0.06624 0.74813 0.089 0.9295
## countrySlovenia:PctYouth -3.31343 4.78856 -0.692 0.4895
## countrySpain:log.GDP 0.77175 2.58403 0.299 0.7654
## countrySweden:log.GDP -14.21898 7.43586 -1.912 0.0568 .
## countrySwitzerland:log.GDP -1.09308 4.97312 -0.220 0.8262
## countryFinland:log.GDP -3.61590 2.42630 -1.490 0.1372
## countryGermany:log.GDP 1.63310 2.71819 0.601 0.5484
## countryAustria:log.GDP -1.63186 4.35215 -0.375 0.7080
## countrySlovenia:log.GDP -4.77991 10.02886 -0.477 0.6340
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.704 on 304 degrees of freedom
## Multiple R-squared: 0.2442, Adjusted R-squared: 0.1871
## F-statistic: 4.272 on 23 and 304 DF, p-value: 1.546e-09
Here we simplify our model to exclude, PctYouth, log(GDP), and all variable interactions.
Results from the ANOVA table produced below suggest that there is a significant difference in methamphetamine consumption among the European countries examined in this study, with Germany consuming the most methamphetamine and Slovenia consuming the least (F=4.98, df=7, p<0.05). Since there were no significant interaction effects between reference country : PctYouth or reference country : log(GDP), our hypothesis that the effect of reference country on methamphetamine usage is a function of PctYouth and log(GDP) was not supported.
#Recreate model without GDP, age, and interaction terms
methmod2 <- lm(data = methplot, log.Weekday.mean ~ country)
#Generate ANOVA table
anova(methmod2)## Analysis of Variance Table
##
## Response: log.Weekday.mean
## Df Sum Sq Mean Sq F value Pr(>F)
## country 7 35.208 5.0298 9.8049 4.437e-11 ***
## Residuals 320 164.156 0.5130
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#Generate summary
summary(methmod2)##
## Call:
## lm(formula = log.Weekday.mean ~ country, data = methplot)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.1838 -0.3430 -0.2013 0.2574 2.6140
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.645587 0.104473 25.323 < 2e-16 ***
## countrySpain -0.002881 0.136701 -0.021 0.98320
## countrySweden -0.015333 0.212400 -0.072 0.94250
## countrySwitzerland 0.442196 0.151144 2.926 0.00368 **
## countryFinland 0.326372 0.137586 2.372 0.01828 *
## countryGermany 0.840799 0.140567 5.981 5.91e-09 ***
## countryAustria -0.136474 0.179692 -0.759 0.44812
## countrySlovenia -0.082527 0.239895 -0.344 0.73106
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.7162 on 320 degrees of freedom
## Multiple R-squared: 0.1766, Adjusted R-squared: 0.1586
## F-statistic: 9.805 on 7 and 320 DF, p-value: 4.437e-11
Here we replot our methamphetamine data using predicted values from our linear model
# Plot by country first
# Make new x values with unique country values, and mean age and GDP (standardizes it)
new.x5 <- expand.grid(
country = unique(methplot$country),
PctYouth = mean(methplot$PctYouth),
log.GDP = mean(methplot$log.GDP))
#New y values predicted from the model
new.y5 <- predict(methmod2, newdata = new.x5, interval = "confidence")
#Add the new x and y to a data frame called addThese5
addThese5 <- data.frame(new.x5, new.y5)
#Plot amphetamine consumption by country using predicted values
ggplot(data = addThese5, aes(x = country, y = fit)) +
geom_point(size = 4) +
geom_errorbar(data = addThese5, aes(ymin = lwr, ymax = upr), size = 0.4, stat = "identity") +
xlab("Country") +
ylab("Mean Methmphetamine Consumed (log(mg/1000 people/day))") +
ggtitle("Mean Methamphetamine Consumed by European Country") +
theme_bw()Lastly, we investigated MDMA usage.
To perform this analysis, we created a subset of the MDMA data in a new dataframe called MDMAplot.
#Create new dataframe containing only data pertaining to MDMA
MDMAplot <- data %>%
filter(drug == "MDMA") %>%
select(country, PctYouth, log.Weekday.mean, log.GDP)
#Check data
glimpse(MDMAplot)## Rows: 330
## Columns: 4
## $ country <fct> "Belgium", "Belgium", "Spain", "Spain", "Spain", "Swe…
## $ PctYouth <dbl> 12.1, 12.1, 10.2, 10.2, 10.2, 13.5, 12.0, 12.0, 12.0,…
## $ log.Weekday.mean <dbl> 3.722556, 2.645465, 2.968875, 2.302585, 2.660959, 2.3…
## $ log.GDP <dbl> 26.98217, 26.98217, 28.02223, 28.02223, 28.02223, 27.…
Here are the summary statistics (mean, sample size, and standard error by country) for the MDMA data.
#Create dataframe with mean MDMA levels, sample size, and standard error by country
plot5 <- MDMAplot %>%
group_by(country) %>%
summarise(
mean = mean(log.Weekday.mean, na.rm = TRUE),
samplesize = n(),
SE = std.error(log.Weekday.mean)
)
plot5## # A tibble: 8 × 4
## country mean samplesize SE
## <fct> <dbl> <int> <dbl>
## 1 Belgium 3.33 46 0.0860
## 2 Spain 2.98 64 0.0480
## 3 Sweden 2.98 11 0.162
## 4 Switzerland 3.13 49 0.0407
## 5 Finland 2.98 68 0.0386
## 6 Germany 3.05 58 0.0531
## 7 Austria 2.91 24 0.0495
## 8 Slovenia 2.92 10 0.0874
Here, the log(drug usage), PctYouth, and log(GDP) data associated with MDMA are presented as simple point plots.
For the first plot of reference country vs MDMA consumption, Sweden appears to consume the least amount of MDMA while Belgium appears to consume the most. The PctYouth vs MDMA consumption plot does not appear to show any relationship, and neither does the log(GDP) vs MDMA plot.
#Plot MDMA data by country
ggplot(data = MDMAplot, aes(x = country, y = log.Weekday.mean)) +
geom_point(size = 4) +
xlab("Country") +
ylab("MDMA Consumed (log(mg/1000 people/day))") +
ggtitle("MDMA Consumed by European Country") +
theme_bw()#Plot MDMA data by age
ggplot(data = MDMAplot, aes(x = PctYouth, y = log.Weekday.mean)) +
geom_point() +
xlab("Percent Youth (Ages 15-24)") +
ylab("MDMA consumed (log(mg/1000 people/day))") +
ggtitle("MDMA Consumed vs Percent Youth") +
theme_classic()#Plot MDMA data by GDP
ggplot(data = MDMAplot, aes(x = log.GDP, y = log.Weekday.mean)) +
geom_point() +
xlab("GDP (log(USD))") +
ylab("MDMA consumed (log(mg/1000 people/day))") +
ggtitle("MDMA Consumed vs GDP") +
theme_classic()Here we fit a linear model, where we hypothesize that the effect of reference country on MDMA usage is a function of PctYouth and GDP, using the variables from the MDMAplot data frame. To address our hypothesis, we are particularly interested the interaction effects between reference country : PctYouth and reference country : log(GDP).
#Create Model
mdmamod <- lm(data = MDMAplot, log.Weekday.mean ~ country + PctYouth + log.GDP + country:PctYouth + country:log.GDP)Before we can run the linear model with MDMA data, we first had to verify that the assumptions of this analysis were met. In the residuals vs. fitted plot (top left), there does not appear to be any major hump-shapes or valleys, suggesting the linear model may be a good fit. The Normal QQ plot (top right) isn’t perfect, the positive residuals seem to be larger than expected, however, most of the other residuals fall on the line suggesting that these data are approximately normal. The scale-location plot (bottom left) does not show any obvious pattern, suggesting that our data have approximately equal variance. Finally, the Constant Leverage plot (bottom right) shows no obvious influential data points. Thus, we were ready to continue with the linear model for MDMA.
#Check assumptions
autoplot(mdmamod, smooth.colour = NA)## Warning: Removed 330 row(s) containing missing values (geom_path).
## Warning: Removed 330 row(s) containing missing values (geom_path).
## Warning: Removed 1 rows containing missing values (geom_point).
Here, we produce an ANOVA table and summary table to help interpret our model.
First looking at the ANOVA table produced below, most of the variation in MDMA usage is explained by PctYouth (Mean sq value of 1.68). Some variation is also explained by reference country (Mean sq of 0.62). log(GDP) and all variable interactions did not capture any significant amount of variation. With this in mind, we will simplify our model to exclude log(GDP) and the interaction terms in the following section.
#Generate ANOVA table and summary
anova(mdmamod)## Analysis of Variance Table
##
## Response: log.Weekday.mean
## Df Sum Sq Mean Sq F value Pr(>F)
## country 7 5.127 0.7324 5.1533 1.465e-05 ***
## PctYouth 1 4.593 4.5928 32.3168 3.056e-08 ***
## log.GDP 1 0.293 0.2926 2.0588 0.1523
## country:PctYouth 7 0.907 0.1296 0.9120 0.4973
## country:log.GDP 7 0.266 0.0380 0.2673 0.9662
## Residuals 306 43.488 0.1421
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(mdmamod)##
## Call:
## lm(formula = log.Weekday.mean ~ country + PctYouth + log.GDP +
## country:PctYouth + country:log.GDP, data = MDMAplot)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.7918 -0.2443 -0.0384 0.1841 1.3732
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 21.08331 28.31814 0.745 0.4571
## countrySpain -13.60951 37.94324 -0.359 0.7201
## countrySweden 12.51519 109.77401 0.114 0.9093
## countrySwitzerland -18.52768 72.38106 -0.256 0.7981
## countryFinland -1.77287 35.40797 -0.050 0.9601
## countryGermany -2.05565 42.41088 -0.048 0.9614
## countryAustria 28.60916 64.49811 0.444 0.6577
## countrySlovenia 139.90580 156.09754 0.896 0.3708
## PctYouth -0.46911 0.20502 -2.288 0.0228 *
## log.GDP -0.45609 1.03823 -0.439 0.6608
## countrySpain:PctYouth -0.35218 0.37198 -0.947 0.3445
## countrySweden:PctYouth 0.09349 0.24589 0.380 0.7041
## countrySwitzerland:PctYouth 0.26412 0.25080 1.053 0.2931
## countryFinland:PctYouth 0.12617 0.25063 0.503 0.6150
## countryGermany:PctYouth -0.15621 0.51337 -0.304 0.7611
## countryAustria:PctYouth 0.45198 0.40127 1.126 0.2609
## countrySlovenia:PctYouth -0.58775 2.70478 -0.217 0.8281
## countrySpain:log.GDP 0.58043 1.39615 0.416 0.6779
## countrySweden:log.GDP -0.50944 4.09896 -0.124 0.9012
## countrySwitzerland:log.GDP 0.56165 2.62139 0.214 0.8305
## countryFinland:log.GDP -0.01347 1.30349 -0.010 0.9918
## countryGermany:log.GDP 0.13117 1.46699 0.089 0.9288
## countryAustria:log.GDP -1.28246 2.33766 -0.549 0.5837
## countrySlovenia:log.GDP -5.55083 5.52587 -1.005 0.3159
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.377 on 306 degrees of freedom
## Multiple R-squared: 0.2046, Adjusted R-squared: 0.1448
## F-statistic: 3.422 on 23 and 306 DF, p-value: 5.187e-07
Here we simplify our model to exclude log(GDP) and the interaction terms.
Results from the ANOVA table produced below suggest that there is a significant difference in MDMA consumption among the European countries examined in this study, with Belgium consuming the most MDMA and Slovenia consuming the least (F=4.09, df=7, p<0.05). Age also had a significant effect on MDMA consumption, with a lower proportion of youth in the population being associated with higher amounts of MDMA usage (F=11.1, df=1, p<0.05). Since there were no significant interaction effects between reference country : PctYouth or reference country : log(GDP), our hypothesis that the effect of reference country on MDMA usage is a function of PctYouth and log(GDP) was not supported.
#Recreate model without GDP, age, and interaction terms
MDMAmod2 <- lm(data = MDMAplot, log.Weekday.mean ~ country + PctYouth)
#Generate ANOVA table
anova(MDMAmod2)## Analysis of Variance Table
##
## Response: log.Weekday.mean
## Df Sum Sq Mean Sq F value Pr(>F)
## country 7 5.127 0.7324 5.2296 1.151e-05 ***
## PctYouth 1 4.593 4.5928 32.7956 2.354e-08 ***
## Residuals 321 44.954 0.1400
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#Generate summary
summary(MDMAmod2)##
## Call:
## lm(formula = log.Weekday.mean ~ country + PctYouth, data = MDMAplot)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.79226 -0.26697 -0.04389 0.20348 1.39415
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.29886 0.69583 10.489 < 2e-16 ***
## countrySpain -1.01336 0.13695 -7.399 1.21e-12 ***
## countrySweden -0.21835 0.12769 -1.710 0.0882 .
## countrySwitzerland -0.33219 0.08045 -4.129 4.65e-05 ***
## countryFinland -0.35859 0.07149 -5.016 8.74e-07 ***
## countryGermany -0.66749 0.10025 -6.659 1.20e-10 ***
## countryAustria -0.65860 0.10354 -6.361 6.91e-10 ***
## countrySlovenia -1.25628 0.19800 -6.345 7.57e-10 ***
## PctYouth -0.34097 0.05954 -5.727 2.35e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3742 on 321 degrees of freedom
## Multiple R-squared: 0.1778, Adjusted R-squared: 0.1573
## F-statistic: 8.675 on 8 and 321 DF, p-value: 9.954e-11
Here we replot our MDMA data using predicted values from our linear model
# Plot by country first
# Make new x values with unique country values, and mean age and GDP (standardizes it)
new.x6 <- expand.grid(
country = unique(MDMAplot$country),
PctYouth = mean(MDMAplot$PctYouth),
log.GDP = mean(MDMAplot$log.GDP))
#New y values predicted from the model
new.y6 <- predict(MDMAmod2, newdata = new.x6, interval = "confidence")
#Add the new x and y to a data frame called addThese6
addThese6 <- data.frame(new.x6, new.y6)
#Plot MDMA consumption by country using predicted values
ggplot(data = addThese6, aes(x = country, y = fit)) +
geom_point(size = 4) +
geom_errorbar(data = addThese6, aes(ymin = lwr, ymax = upr), size = 0.4, stat = "identity") +
xlab("Country") +
ylab("Mean MDMA Consumed (log(mg/1000 people/day))") +
ggtitle("Mean MDMA Consumed by European Country") +
theme_bw()#Now do the same with age
#Make new x values
new.x7 <- expand.grid(
country = unique(MDMAplot$country),
PctYouth = seq(from=min(MDMAplot$PctYouth), to=max(MDMAplot$PctYouth) +1),
log.GDP = mean(MDMAplot$log.GDP))
#New y values predicted from the model
new.y7 <- predict(MDMAmod2, newdata = new.x7, interval = "confidence")
#Add the new x and y to a data frame called addThese7
addThese7 <- data.frame(new.x7, new.y7)
#Plot age vs cocaine consumption including countries
ggplot(data = addThese7, aes(x = PctYouth, y = fit, group = country, color = country)) +
geom_point() +
geom_smooth(method="lm") +
xlab("Percent Youth (Ages 15-24)") +
ylab("Mean MDMA Consumed (log(mg/1000 people/day))") +
ggtitle("Mean MDMA Consumed vs Percent Youth") +
theme_classic()## `geom_smooth()` using formula 'y ~ x'